syntax analysis - ecourse2.ccu.edu.tw
TRANSCRIPT
![Page 1: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/1.jpg)
1
Syntax Analysis
![Page 2: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/2.jpg)
2
Syntax Analysis
Introduction to parsers
Context-free grammars
Push-down automata
Top-down parsing
Buttom-up parsing
Bison - a parser generator
![Page 3: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/3.jpg)
3
Introduction to parsers
Lexical
AnalyzerParser
Symbol
Table
token
next token
source Semantic
Analyzersyntax
treecode
![Page 4: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/4.jpg)
4
Context-Free Grammars
A set of terminals: basic symbols from
which sentences are formed
A set of nonterminals: syntactic
categories denoting sets of sentences
A set of productions: rules specifying
how the terminals and nonterminals can
be combined to form sentences
The start symbol: a distinguished
nonterminal denoting the language
![Page 5: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/5.jpg)
5
An Example
Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’
Nonterminals: expr, op
Productions:
expr expr op expr
expr ‘(’ expr ‘)’
expr ‘-’ expr
expr id
op ‘+’ | ‘-’ | ‘*’ | ‘/’
The start symbol: expr
![Page 6: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/6.jpg)
6
Derivations
A derivation step is an application of a
production as a rewriting rule
E - E
A sequence of derivation steps
E - E - ( E ) - ( id )
is called a derivation of “- ( id )” from E
The symbol * denotes “derives in zero or
more steps”; the symbol + denotes “derives
in one or more steps
E * - ( id ) E + - ( id )
![Page 7: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/7.jpg)
7
Context-Free Languages
A context-free language L(G) is the language
defined by a context-free grammar G
A string of terminals is in L(G) if and only
if S + , is called a sentence of G
If S * , where may contain nonterminals,
then we call a sentential form of G
E - E - ( E ) - ( id )
G1 is equivalent to G2 if L(G1) = L(G2)
![Page 8: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/8.jpg)
8
Left- & Right-most Derivations
Each derivation step needs to choose
– a nonterminal to rewrite
– a production to apply
A leftmost derivation always chooses the
leftmost nonterminal to rewrite
E lm - E lm - ( E ) lm - ( E + E )
lm - ( id + E ) lm - ( id + id )
A rightmost derivation always chooses the
rightmost nonterminal to rewrite
E rm - E rm - ( E ) rm - ( E + E )
rm - (E + id ) rm - ( id + id )
![Page 9: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/9.jpg)
9
Parse Trees
A parse tree is a graphical representation
for a derivation that filters out the order of
choosing nonterminals for rewriting
Many derivations may correspond to the
same parse tree, but every parse tree has
associated with it a unique leftmost and a
unique rightmost derivation
![Page 10: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/10.jpg)
10
An Example
E
-
( )
+
id id
E
E E
EE
lm - E
lm - ( E )
lm - ( E + E )
lm - ( id + E )
lm - ( id + id )
E
rm - E
rm - ( E )
rm - ( E + E )
rm - ( E + id )
rm - ( id + id )
![Page 11: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/11.jpg)
11
Ambiguous Grammar
A grammar is ambiguous if it produces
more than one parse tree for some sentence
E
E + E
id + E
id + E * E
id + id * E
id + id * id
E
E * E
E + E * E
id + E * E
id + id * E
id + id * id
![Page 12: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/12.jpg)
12
Ambiguous Grammar
E
+E E
id
id
*E E
id
E
*E E
id
id
+E E
id
![Page 13: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/13.jpg)
13
Resolving Ambiguity
Use disambiguiting rules to throw away
undesirable parse trees
Rewrite grammars by incorporating
disambiguiting rules into grammars
![Page 14: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/14.jpg)
14
An Example
The dangling-else grammar
stmt if expr then stmt
| if expr then stmt else stmt
| other
Two parse trees for
if E1 then if E2 then S1 else S2
![Page 15: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/15.jpg)
15
An Example
S
elseE S Sif then
if E then S
elseE
S
S Sif then
if E then S
![Page 16: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/16.jpg)
16
Disambiguiting Rules
Rule: match each else with the closest
previous unmatched then
Remove undesired state transitions in the
pushdown automaton
![Page 17: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/17.jpg)
17
Grammar Rewriting
stmt m_stmt
| unm_stmt
m_stmt if expr then m_stmt else m_stmt
| other
unm_stmt if expr then stmt
| if expr then m_stmt else unm_stmt
![Page 18: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/18.jpg)
18
RE vs. CFG
Every language described by a RE can also
be described by a CFG
Why use REs for lexical syntax?
– do not need a notation as powerful as CFGs
– are more concise and easier to understand than
CFGs
– More efficient lexical analyzers can be
constructed from REs than from CFGs
– Provide a way for modularizing the front end
into two manageable-sized components
![Page 19: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/19.jpg)
19
Push-Down Automata
Finite Automaton
Input
OutputStack
$
$
![Page 20: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/20.jpg)
20
An Example
S’ S $
S a S b
S
1 2 3start (a, $)
a
(b, a)
a
($, $)
(a, a)
a
(b, a)
a
0
($, $)
![Page 21: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/21.jpg)
21
Nonregular Constructs
REs can denote only a fixed number of
repetitions or an unspecified number of
repetitions of one given construct:
an, a*
A nonregular construct:
– L = {anbn | n 0}
![Page 22: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/22.jpg)
22
Non-Context-Free Constructs
CFGs can denote only a fixed number of
repetitions or an unspecified number of
repetitions of one or two given constructs
Some non-context-free constructs:
– L1 = {wcw | w is in (a | b)*}
– L2 = {anbmcndm | n 1 and m 1}
– L3 = {anbncn | n 0}
![Page 23: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/23.jpg)
共勉
子貢曰︰貧而無諂,富而無驕,何如。
子曰︰可也,未若貧而樂,富而好禮者也。
子貢曰︰詩云︰「如切如磋,如琢如磨。」
其斯之謂與。
子曰︰賜也,始可與言詩已矣;
告諸往而知來者。
--論語
![Page 24: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/24.jpg)
24
Top-Down Parsing
Construct a parse tree from the root to the
leaves using leftmost derivation
1. S c A B input: cad
2. A a b
3. A a
4. B d
S S
c A B
1S
c A B
a b
2S
c A B
a d
4S
c A B
a
3
backtrack
![Page 25: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/25.jpg)
25
Predictive Parsing
A top-down parsing without backtracking
– there is only one alternative production to
choose at each derivation step
stmt if expr then stmt else stmt
| while expr do stmt
| begin stmt_list end
![Page 26: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/26.jpg)
26
LL(k) Parsing
The first L stands for scanning the input
from left to right
The second L stands for producing a
leftmost derivation
The k stands for the number of lookahead
input symbols used to choose alternative
productions at each derivation step
![Page 27: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/27.jpg)
27
LL(1) Parsing
Use one input symbol of lookahead
LL(1): S a b e | c d e
LL(2): S a b e | a d e
![Page 28: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/28.jpg)
28
Bottom-Up Parsing
Construct a parse tree from the leaves to the
root using rightmost derivation in reverse
S a A B e input: abbcde
A A b c | b
B d
ca d eb
A
b
A
ca d eb
A
b
BA
ca d eb
A
b
S
BA
ca d eb
A
bca d ebb
abbcde rm aAbcde rm aAde rm aABe rm S
![Page 29: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/29.jpg)
29
Handles
A handle of a right-sentential form consists of
– a production A
– a position of where can be replaced by A to
produce the previous right-sentential form in a
rightmost derivation of
abbcde rm aAbcde rm aAde rm aABe rm S
A b A A b c B d S a A B e
![Page 30: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/30.jpg)
30
Handle Pruning
rm A rm S
S
A
The string to the right of the handle contains only
terminals
A is the bottommost leftmost interior node with all
its children in the tree
![Page 31: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/31.jpg)
31
An Example
S
S
BA
ca d eb
A
b
S
BA
ca d eb
A
S
BA
a d e
S
BA
a e
![Page 32: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/32.jpg)
32
Shift-Reduce Parsing
Parsing driver
Parsing table
Input
Output
Stack
Handle
$
$
![Page 33: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/33.jpg)
33
Stack Operations
Shift: shift the next input symbol onto the
top of the stack
Reduce: replace the handle at the top of the
stack with the corresponding nonterminal
Accept: announce successful completion of
the parsing
Error: call an error recovery routine
![Page 34: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/34.jpg)
34
An Example
Action Stack Input
S $ a b b c d e $
S $ a b b c d e $
R $ a b b c d e $
S $ a A b c d e $
S $ a A b c d e $
R $ a A b c d e $
S $ a A d e $
R $ a A d e $
S $ a A B e $
R $ a A B e $
A $ S $
![Page 35: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/35.jpg)
35
Shift/Reduce Conflict
stmt if expr then stmt
| if expr then stmt else stmt
| other
Stack Input
$ - - - if expr then stmt else - - - $
Shift if expr then stmt else stmt
Reduce if expr then stmt
![Page 36: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/36.jpg)
36
Reduce/Reduce Conflict
stmt id ( para_list )
stmt expr := expr
para_list para_list , para
para_list para
para id
expr id ( expr_list )
expr id
expr_list expr_list , expr
expr_list expr
Stack Input
$ - - - id ( id , id ) - - - $
$- - - procid ( id , id ) - - - $
![Page 37: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/37.jpg)
37
LR(k) Parsing
The L stands for scanning the input from
left to right
The R stands for constructing a rightmost
derivation in reverse
The k stands for the number of lookahead
input symbols used to make parsing
decisions
![Page 38: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/38.jpg)
38
LR Parsing
The LR parsing algorithm
Constructing SLR(1) parsing tables
Constructing LR(1) parsing tables
Constructing LALR(1) parsing tables
![Page 39: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/39.jpg)
39
Model of an LR Parser
Parsing driver
Input
Output
Stack
Action Goto
Sm
Sm-1
Xm-1
Xm
S0Parsing table
$
$
![Page 40: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/40.jpg)
40
An Example
(1) E E + T
(2) E T
(3) T T * F
(4) T F
(5) F ( E )
(6) F id
State Action Goto
id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
![Page 41: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/41.jpg)
41
An Example
Action Stack Input
s5 $0 id + id * id $
r6 $0 id5 + id * id $
r4 $0 F3 + id * id $
r2 $0 T2 + id * id $
s6 $0 E1 + id * id $
s5 $0 E1 +6 id * id $
r6 $0 E1 +6 id5 * id $
r4 $0 E1 +6 F3 * id $
s7 $0 E1 +6 T9 * id $
s5 $0 E1 +6 T9 *7 id $
r6 $0 E1 +6 T9 *7 id5 $
r3 $0 E1 +6 T9 *7 F10 $
r1 $0 E1 +6 T9 $
acc $0 E1 $
![Page 42: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/42.jpg)
42
LR Parsing Driver
push $s0 onto the stack, where s0 is the initial state
set ip to point to the first symbol of w$;
repeat
let s be the top state on the stack and a the symbol pointed to by ip;
if action[s, a] == shift s’ then
push a and s’ onto the stack and advance ip
else if action[s, a] == reduce A then
pop 2 * | | symbols off the stack; s’ = goto[top(), A];
push A and s’ onto the stack
else if action[s, a] == accept then
return
else error
until false
![Page 43: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/43.jpg)
43
First Sets
The first set of a string is the set of
terminals that begin the strings derived from
. If * , then is also in the first set of
.
![Page 44: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/44.jpg)
44
First Sets
If X is terminal, then FIRST(X) is {X}
If X is nonterminal and X is a
production, then add to FIRST(X)
If X is nonterminal and X Y1 Y2 ... Yk is
a production, then add a to FIRST(X) if
for some i, a is in FIRST(Yi) and is in all
of FIRST(Y1), ..., FIRST(Yi-1). If is in
FIRST(Yj) for all j, then add to FIRST(X)
![Page 45: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/45.jpg)
45
An Example
E T E'
E' + T E' |
T F T'
T' * F T' |
F ( E ) | id
FIRST(F) = { (, id }
FIRST(T') = { *, }, FIRST(T) = { (, id }
FIRST(E') = { +, }, FIRST(E) = { (, id }
![Page 46: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/46.jpg)
46
Follow Sets
The follow set of a nonterminal A is the set of
terminals that can appear immediately to the
right of A in some sentential form, namely,
S * A a
a is in the follow set of A.
![Page 47: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/47.jpg)
47
Follow Sets
Place $ in FOLLOW(S), where S is the start
symbol and $ is the input right endmarker
If there is a production A B , then
everything in FIRST() except for is placed
in FOLLOW(B)
If there is a production A B or A B
where FIRST() contains , then everything
in FOLLOW(A) is in FOLLOW(B)
![Page 48: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/48.jpg)
48
An Example
E T E'
E' + T E' |
T F T'
T' * F T' |
F ( E ) | id
FIRST(E) = FIRST(T) = FIRST(F) = { (, id }
FIRST(E') = { +, }, FIRST(T') = { *, }
FOLLOW(E) = { ), $ }, FOLLOW(E') = { ), $ }
FOLLOW(T) = { +, ), $ }, FOLLOW(T') = { +, ), $ }
FOLLOW(F) = { *, +, ), $ }
![Page 49: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/49.jpg)
49
SLR(1) Parsing Table
LR(0) Items
• An LR(0) item of a grammar in G is a production of G with a dot at some position of the right-hand side, A
• An LR(0) item represents a state in an NPDA indicating how much of a production we have seen at a given point in the parsing process
• The production A X Y Z yields the following four LR(0) items
A • X Y Z, A X • Y Z, A X Y • Z, A X Y Z •
![Page 50: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/50.jpg)
50
From CFG to NPDA
• The state A a will go to the state A
a via an edge of terminal a (a shifting)
• The state A B will go to the state B
via an edge of the empty string
• The state A will cause a reduction on
seeing a terminal in FOLLOW(A)
• The state A B will go to the state A
B via an edge of nonterminal B (after a
reduction)
![Page 51: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/51.jpg)
51
An Example
1. E’ E
2. E E + T
3. E T
4. T T * F
5. T F
6. F ( E )
7. F id
Augmented grammar:Easier to identify
the accepting state
![Page 52: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/52.jpg)
52
An Example
E’•E
0
E
E’E•
7
T
ET•
9
F
TF•
11
E•E+T
1
E•T
2
T•T*F
3
T•F
4
EE•+T
8E
EE+•T
14+
17
EE+T•T
F•(E)
5
F•id
6 Fid•
13id
F(•E)
12(
TT•*F
10
TTT*•F*
15
TT*F•F
18
E F(E•)
16)
F(E)•
19
5
6
1
2
3
4
![Page 53: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/53.jpg)
53
From NPDA to DPDA
• There are two functions performed on sets
of LR(0) items (DPDA states)
• The function closure(I) adds more items to I
when there is a dot to the left of a
nonterminal (corresponding to edges)
• The function goto(I, X) moves the dot past
the symbol X in all items in I that contain X
(corresponding to non- edges)
![Page 54: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/54.jpg)
54
The Closure Function
function closure(I);
begin
J := I;
repeat
for each item A B in J and
each production B of G such that
B is not in J do
J = J { B }
until no more items can be added to J;
return J
end
![Page 55: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/55.jpg)
55
An Example
1. E’ E
2. E E + T
3. E T
4. T T * F
5. T F
6. F ( E )
7. F id
s0 = E’ E,
I0 = closure({s0 }) =
{ E’ E,
E E + T,
E T,
T T * F,
T F,
F ( E ),
F id }
![Page 56: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/56.jpg)
56
The Goto Function
function goto(I, X);
begin
set J to the empty set
for any item A X in I do
add A X to J
return closure(J)
end
![Page 57: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/57.jpg)
57
An Example
I0 = {E’ E, E E + T, E T,
T T * F, T F, F ( E ),
F id }
goto(I0 , E)
= closure({E’ E , E E + T })
= {E’ E , E E + T }
![Page 58: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/58.jpg)
58
Subset Construction
function items(G’);
begin
C := {closure({S’ S})}
repeat
for each set of items I in C and each symbol X do
J := goto(I, X)
if J is not empty and not in C then
C = C { J }
until no more sets of items can be added to C
return C
end
![Page 59: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/59.jpg)
59
An Example
1. E’ E
2. E E + T
3. E T
4. T T * F
5. T F
6. F ( E )
7. F id
![Page 60: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/60.jpg)
60
I0 : E’ E
E E + T
E T
T T * F
T F
F ( E )
F id
goto(I0, E) =
I1 : E’ E
E E + T
goto(I0, T) =
I2 : E T
T T * F
goto(I0, F) =
I3 : T F
goto(I0, ‘(’) =
I4 : F ( E )
E E + T
E T
T T * F
T F
F ( E )
F id
goto(I0, id) =
I5 : F id
goto(I1, ‘+’) =
I6 : E E + T
T T * F
T F
F ( E )
F id
goto(I2, ‘*’) =
I7 : T T * F
F ( E )
F id
goto(I4, E) =
I8 : F ( E )
E E + T
goto(I6, T) =
I9 : E E + T
T T * F
goto(I7, F) =
I10 : T T * F
goto(I8, ‘)’) =
I11 : F ( E )
![Page 61: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/61.jpg)
61
An Example
E’ • E
E • E + T
E • T
T • T * F
T • F
F • ( E )
F • id
E’ E •
E E • + T
E T •
T T • * F
E T
T F •
F
F ( • E )
E • E + T
E • T
T • T * F
T • F
F • ( E )
F • id
F id •id
(
T T * • F
F • ( E )
F • id*
E E + • T
T • T * F
T • F
F • ( E )
F • id
+
F ( E • )
E E • + T
F T T * F •
E E + T •
T T • * FT
F ( E ) •
)
0
1
2
3
4
5
6
7
8
9
10
11
(id *
+
id
ET
F
F(
(id
![Page 62: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/62.jpg)
62
SLR(1) Parsing Table Generation
procedure SLR(G’);
begin
for each state I in items(G’) do begin
if A a in I and goto(I, a) = J for a terminal a then
action[I, a] = “shift J”
if A in I and A S’ then
action[I, a] = “reduce A ” for all a in Follow(A)
if S’ S in I then action[I, $] = “accept”
if A X in I and goto(I, X) = J for a nonterminal X
then goto[I, X] = J
end
all other entries in action and goto are made error
end
![Page 63: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/63.jpg)
63
An Example
+ * ( ) id $ E T F
0 s4 s5 1 2 3
1 s6 a
2 r3 s7 r3 r3
3 r5 r5 r5 r5
4 s4 s5 8 2 3
5 r7 r7 r7 r7
6 s4 s5 9 3
7 s4 s5 10
8 s6 s11
9 r2 s7 r2 r2
10 r4 r4 r4 r4
11 r6 r6 r6 r6
![Page 64: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/64.jpg)
64
共勉
大學之道︰
在明明德,在親民,在止於至善。
--大學
![Page 65: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/65.jpg)
65
LR(1) Parsing Table
LR(1) Items
• An LR(1) item of a grammar in G is a pair,
( A , a ), of an LR(0) item A and a lookahead symbol a
• The lookahead has no effect in an LR(1) item
of the form ( A , a ), where is not
• An LR(1) item of the form ( A , a )
calls for a reduction by A only if the next
input symbol is a
![Page 66: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/66.jpg)
66
The Closure Function
function closure(I);
begin
J := I;
repeat
for each item (A B , a) in J and
each production B of G and
each b FIRST( a) such that
(B , b) is not in J do
J = J { (B , b) }
until no more items can be added to J;
return J
end
![Page 67: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/67.jpg)
67
The Goto Function
function goto(I, X);
begin
set J to the empty set
for any item (A X , a) in I do
add (A X , a) to J
return closure(J)
end
![Page 68: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/68.jpg)
68
Subset Construction
function items(G’);
begin
C := {closure({S’ S, $})}
repeat
for each set of items I in C and each symbol X do
J := goto(I, X)
if J is not empty and not in C then
C = C { J }
until no more sets of items can be added to C
return C
end
![Page 69: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/69.jpg)
69
An Example
1. S’ S
2. S C C
3. C c C
4. C d
![Page 70: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/70.jpg)
70
An Example
I0: closure({(S’ S, $)}) =
(S’ S, $)
(S C C, $)
(C c C, c/d)
(C d, c/d)
I1: goto(I0, S) = (S’ S , $)
I2: goto(I0, C) =
(S C C, $)
(C c C, $)
(C d, $)
I3: goto(I0, c) =
(C c C, c/d)
(C c C, c/d)
(C d, c/d)
I4: goto(I0, d) =
(C d , c/d)
I5: goto(I2, C) =
(S C C , $)
![Page 71: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/71.jpg)
71
An Example
I6: goto(I2, c) =
(C c C, $)
(C c C, $)
(C d, $)
I7: goto(I2, d) =
(C d , $)
I8: goto(I3, C) =
(C c C , c/d)
: goto(I3, c) = I3
: goto(I3, d) = I4
I9: goto(I6, C) =
(C c C , $)
: goto(I6, c) = I6
: goto(I6, d) = I7
![Page 72: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/72.jpg)
72
LR(1) Parsing Table Generation
procedure LR(G’);
begin
for each state I in items(G’) do begin
if (A a , b) in I and goto(I, a) = J for a terminal a then
action[I, a] = “shift J”
if (A , a) in I and A S’ then
action[I, a] = “reduce A ”
if (S’ S , $) in I then action[I, $] = “accept”
if (A X , a) in I and goto(I, X) = J for a nonterminal X
then goto[I, X] = J
end
all other entries in action and goto are made error
end
![Page 73: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/73.jpg)
73
An Example
c d $ S C
0 s3 s4 1 2
1 a
2 s6 s7 5
3 s3 s4 8
4 r4 r4
5 r2
6 s6 s7 9
7 r4
8 r3 r3
9 r3
![Page 74: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/74.jpg)
74
LALR(1) Parsing Table
The Core of LR(1) Items
• The core of a set of LR(1) items is the set of
their first components (i.e., LR(0) items)
• The core of the set of LR(1) items
{ (C c C, c/d),
(C c C, c/d),
(C d, c/d) }
is {C c C,
C c C,
C d }
![Page 75: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/75.jpg)
75
Merging Cores
I3: { (C c C, c/d)
(C c C, c/d)
(C d, c/d) }
I4: { (C d , c/d) }
I8: { (C c C , c/d) }
I6: { (C c C, $)
(C c C, $)
(C d, $) }
I7: { (C d , $) }
I9: { (C c C , $) }
![Page 76: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/76.jpg)
76
LALR(1) Parsing Table Generation
procedure LALR(G’);
begin
for each state I in mergeCore(items(G’)) do begin
if (A a , b) in I and goto(I, a) = J for a terminal a then
action[I, a] = “shift J”
if (A , a) in I and A S’ then
action[I, a] = “reduce A ”
if (S’ S , $) in I then action[I, $] = “accept”
if (A X , a) in I and goto(I, X) = J for a nonterminal X
then goto[I, X] = J
end
all other entries in action and goto are made error
end
![Page 77: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/77.jpg)
77
An Example
c d $ S C
0 s36 s47 1 2
1 a
2 s36 s47 5
36 s36 s47 89
47 r4 r4 r4
5 r2
89 r3 r3 r3
![Page 78: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/78.jpg)
78
LR Grammars
• A grammar is SLR(1) iff its SLR(1) parsing
table has no multiply-defined entries
• A grammar is LR(1) iff its LR(1) parsing
table has no multiply-defined entries
• A grammar is LALR(1) iff its LALR(1)
parsing table has no multiply-defined
entries
![Page 79: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/79.jpg)
79
Hierarchy of Grammar Classes
Unambiguous Grammars Ambiguous Grammars
LL(k) LR(k)
LR(1)
LALR(1)
LL(1) SLR(1)
![Page 80: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/80.jpg)
80
Hierarchy of Grammar Classes
• Why LL(k) LR(k)?
• Why SLR(k) LALR(k) LR(k)?
![Page 81: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/81.jpg)
81
LL(k) vs. LR(k)
• For a grammar to be LL(k), we must be able
to recognize the use of a production by
seeing only the first k symbols of what its
right-hand side derives
• For a grammar to be LR(k), we must be able
to recognize the use of a production by
having seen all of what is derived from its
right-hand side with k more symbols of
lookahead
![Page 82: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/82.jpg)
82
LALR(k) vs. LR(k)
• The merge of the sets of LR(1) items having the
same core does not introduce shift/reduce conflicts
• Suppose there is a shift-reduce conflict on
lookahead a in the merged set because of
1. (A , a) 2. (B a , b)
• Then some set of items has item (A , a) , and
since the cores of all sets merged are the same, it
must have an item (B a , c) for some c
• But then this set has the same shift/reduce conflict
on a
![Page 83: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/83.jpg)
83
LALR(k) vs. LR(k)
• The merge of the sets of LR(1) items having the same core may introduce reduce/reduce conflicts
• As an example, consider the grammar1. S’ S2. S a A d | a B e | b A e | b B d3. A c4. B c
that generates acd, ace, bce, bcd
• The set {(A c , d), (B c , e)} is valid for acx
• The set {(A c , e), (B c , d)} is valid for bcx
• But the union {(A c , d/e), (B c , d/e)} generates a reduce/reduce conflict
![Page 84: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/84.jpg)
84
SLR(k) vs. LALR(k)
1. S’ S
2. S L = R
3. S R
4. L * R
5. L id
6. R L
![Page 85: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/85.jpg)
85
SLR(k) vs. LALR(k)
I0: closure({S’ S}) =
S’ S
S L = R
S R
L * R
L id
R L
I1: goto(I0, S) = S’ S
I2: goto(I0, L) =
S L = R
R L
I3: goto(I0, R) =
S R
I4: goto(I0, *) =
L * R
R L
L * R
L id
I5: goto(I0, id) =
L id
FOLLOW(R) = {=, $}
![Page 86: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/86.jpg)
86
SLR(k) vs. LALR(k)
I6: goto(I2, =) =
S L = R
R L
L * R
L id
I7: goto(I4, R) =
L * R
I8: goto(I4, L) =
R L
I9: goto(I6, R) =
S L = R
![Page 87: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/87.jpg)
87
SLR(k) vs. LALR(k)
I0: closure({(S’ S, $)}) =
(S’ S, $)
(S L = R, $)
(S R, $)
(L * R, =/$)
(L id, =/$)
(R L, $)
I1: goto(I0, S) = (S’ S , $)
I2: goto(I0, L) =
(S L = R, $)
(R L , $)
I3: goto(I0, R) =
(S R , $)
I4: goto(I0, *) =
(L * R, =/$)
(R L, =/$)
(L * R, =/$)
(L id, =/$)
I5: goto(I0, id) =
(L id , =/$)
![Page 88: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/88.jpg)
88
SLR(k) vs. LALR(k)
I6: goto(I2, =) =
(S L = R, $)
(R L, $)
(L * R, $)
(L id, $)
I7: goto(I4, R) =
(L * R , =/$)
I8: goto(I4, L) =
(R L , =/$)
I9: goto(I6, R) =
(S L = R , $)
I10: goto(I6, L) =
(R L , $)
I11: goto(I6, *) =
(L * R, $)
(R L, $)
(L * R, $)
(L id, $)
I12: goto(I6, id) =
(L id , $)
I13: goto(I11, R) =
(L * R , $)
I4
I5
![Page 89: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/89.jpg)
89
Bison – A Parser Generator
Bison compiler
C compiler
a.out
lang.ylang.tab.c
lang.tab.h (-d option)
lang.tab.c a.out
tokens syntax tree
A langauge for specifying parsers and semantic analyzers
![Page 90: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/90.jpg)
90
Bison Programs
%{
C declarations
%}
Bison declarations
%%
Grammar rules
%%
Additional C code
![Page 91: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/91.jpg)
91
An Example
line expr ‘\n’
expr expr ‘+’ term | term
term term ‘*’ factor | factor
factor ‘(’ expr ‘)’ | DIGIT
![Page 92: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/92.jpg)
92
An Example
%token DIGIT
%start line
%%
line : expr ‘\n’ {printf(“line: expr \\n\n”);}
;
expr: expr ‘+’ term {printf(“expr: expr + term\n”);}
| term {printf(“expr: term\n”}
;
term: term ‘*’ factor {printf(“term: term * factor\n”;}
| factor {printf(“term: factor\n”);}
;
factor: ‘(’ expr ‘)’ {printf(“factor: ( expr )\n”);}
| DIGIT {printf(“factor: DIGIT\n”);}
;
![Page 93: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/93.jpg)
93
Functions and Variables
• yyparse(): the parser function
• yylex(): the lexical analyzer function. Bison recognizes any non-positive value as indicating the end of the input
• yylval: the attribute value of a token. Its default type is int, and can be declared to be multiple types in the first section using
%union {int ival;double dval;
}
![Page 94: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/94.jpg)
94
Conflict Resolutions
• A reduce/reduce conflict is resolved by
choosing the production listed first
• A shift/reduce conflict is resolved in favor
of shift
• A mechanism for assigning precedences and
assocoativities to terminals
![Page 95: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/95.jpg)
95
Precedence and Associativity
• The precedence and associativity of operators are
declared simultaneously
%nonassoc ‘<’ /* lowest */
%left ‘+’ ‘-’
%right ‘^’ /* highest */
• The precedence of a rule is determined by the
precedence of its rightmost terminal
• The precedence of a rule can be modified by
adding %prec <terminal> to its right end
![Page 96: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/96.jpg)
96
An Example
%{
#include <stdio.h>
%}
%token NUMBER
%left ‘+’ ‘-’
%left ‘*’ ‘/’
%right UMINUS
%%
![Page 97: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/97.jpg)
97
An Example
line : expr ‘\n’
;
expr: expr ‘+’ expr
| expr ‘-’ expr
| expr ‘*’ expr
| expr ‘/’ expr
| ‘-’ expr %prec UMINUS
| ‘(’ expr ‘)’
| NUMBER
;
![Page 98: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/98.jpg)
98
Error Recovery
• Error recovery is performed via error productions
• An error production is a production containing
the predefined terminal error
• After adding an error production,
A B | error
on encountering an error in the middle of B, the
parser pops symbols from its stack until , shifts
error, and skips input tokens until a token in
FIRST()
![Page 99: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/99.jpg)
99
Error Recovery
• The parser can report a syntax error by calling
the user provided function yyerror(char *)
• The parser will suppress the report of another
error message for 3 tokens
• You can resume error report immediately by
using the macro yyerrok
• Error productions are used for major
nonterminals
![Page 100: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/100.jpg)
100
An Example
line : expr ‘\n’
| error ‘\n’ {yyerror("reenter last line:");
yyerrok;}
;
expr: expr ‘+’ expr
| expr ‘*’ expr
| ‘-’ expr %prec UMINUS
| ‘(’ expr ‘)’
| NUMBER
;
![Page 101: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/101.jpg)
101
共勉
樊遲問仁。
子曰︰「愛人。」
子曰︰「人之過也,各於其黨,
觀過,斯知仁矣。」
--論語
![Page 102: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/102.jpg)
102
共勉
子曰︰唯仁者,能好人,能惡人。
子曰︰茍志於仁矣,無惡也。
--論語
子曰︰志士仁人,無求生以害仁,
有殺身以成仁。
![Page 103: Syntax Analysis - ecourse2.ccu.edu.tw](https://reader034.vdocument.in/reader034/viewer/2022050302/626f05d788db701d5e777856/html5/thumbnails/103.jpg)
103
共勉
子曰︰里仁為美。擇不處仁,焉得知?
子曰︰朝聞道,夕死可矣!
--論語
子曰︰不仁者不可以久處約,
不可以長處樂。仁者安仁,知者利仁。