building a parser i
DESCRIPTION
Building a Parser I. CS164 3:30-5:00 TT 10 Evans. PA2. in PA2, you’ll work in pairs, no exceptions except the exception if odd # of students hate team projects? form a “coalition team” team members work alone, but - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/1.jpg)
Prof. Bodik CS 164 Lecture 5 1
Building a Parser I
CS1643:30-5:00 TT
10 Evans
![Page 2: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/2.jpg)
Prof. Bodik CS 164 Lecture 52
PA2
• in PA2, you’ll work in pairs, no exceptions– except the exception if odd # of students
• hate team projects? form a “coalition team”– team members work alone, but
• discuss design, clarify the handout, keep a common eye on the newsgroup, etc
• share some or all code, at the very least their test cases!
– a win-win proposition:• work mainly alone but hedge your grade• each member submits his/her project, graded separately• score: the lower-scoring team member gets a bonus
equal to half the difference between his and his partner’s score
![Page 3: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/3.jpg)
Prof. Bodik CS 164 Lecture 53
Administrativia
• Section room change– 3113 Etcheverry moving next door, to 3111
Etch.– starting 9/22.
![Page 4: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/4.jpg)
Prof. Bodik CS 164 Lecture 54
Overview
• What does a parser do, again?– its two tasks– parse tree vs. AST
• A hand-written parser– and why it gets hard to get it right
![Page 5: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/5.jpg)
What does a parser do?
![Page 6: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/6.jpg)
Prof. Bodik CS 164 Lecture 56
Recall: The Structure of a Compiler
scanner
parser
checker
code gen
Decaf program (stream of characters)
stream of tokens
Abstract Syntax Tree (AST)
AST with annotations (types, declarations)
maybe x86
![Page 7: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/7.jpg)
Prof. Bodik CS 164 Lecture 57
Recall: Syntactic Analysis
• Input: sequence of tokens from scanner• Output: abstract syntax tree• Actually,
– parser first builds a parse tree– AST is then built by translating the parse tree– parse tree rarely built explicitly; only determined
by, say, how parser pushes stuff to stack– our lectures first focus on constructing the parse
tree; later we’ll show the translation to AST.
![Page 8: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/8.jpg)
Prof. Bodik CS 164 Lecture 58
Example
• Decaf4*(2+3)
• Parser inputNUM(4) TIMES LPAR NUM(2) PLUS
NUM(3) RPAR
• Parser output (AST): *
NUM(4)+
NUM(2) NUM(3)
![Page 9: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/9.jpg)
Prof. Bodik CS 164 Lecture 59
Parse tree for the example
leaves are tokens
NUM(4) TIMES LPAR NUM(2) PLUS NUM(3) RPAR
EXPR
EXPR
EXPR
![Page 10: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/10.jpg)
Prof. Bodik CS 164 Lecture 510
Another example
• Decafif (x == y) { a=1; }
• Parser inputIF LPAR ID EQ ID RPAR LBR ID AS INT
SEMI RBR
• Parser output (AST):IF-THEN
==
ID ID
=
ID INT
![Page 11: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/11.jpg)
Prof. Bodik CS 164 Lecture 511
Parse tree for the example
IF LPAR ID == ID RPAR LBR ID = INT SEMI RBR
EXPR EXPR
STMT
BLOCK
STMT
leaves are tokens
![Page 12: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/12.jpg)
Prof. Bodik CS 164 Lecture 512
Parse tree vs. abstract syntax tree
• Parse tree– contains all tokens, including those that parser
needs “only” to discover• intended nesting: parentheses, curly braces• statement termination: semicolons
– technically, parse tree shows concrete syntax
• Abstract syntax tree (AST)– abstracts away artifacts of parsing, by flattening
tree hierarchies, dropping tokens, etc.– technically, AST shows abstract syntax
![Page 13: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/13.jpg)
Prof. Bodik CS 164 Lecture 513
Comparison with Lexical Analysis
Phase Input Output
Lexer Sequence of characters
Sequence of tokens
Parser Sequence of tokens
AST, built from parse tree
![Page 14: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/14.jpg)
Prof. Bodik CS 164 Lecture 514
Summary
• Parser performs two tasks:
– syntax checking• a program with a syntax error may produce an AST
that’s different than intended by the programmer
– parse tree construction• usually implicit • used to build the AST
![Page 15: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/15.jpg)
How to build a parser for Decaf?
![Page 16: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/16.jpg)
Prof. Bodik CS 164 Lecture 516
Writing the parser
• Can do it all by hand, of course– ok for small languages, but hard for Decaf
• Just like with the scanner, we’ll write ourselves a parser generator– we’ll concisely describe Decaf’s syntactic
structure• that is, how expressions, statements, definitions look
like
– and the generator produces a working parser
• Let’s start with a hand-written parser– to see why we want a parser generator
![Page 17: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/17.jpg)
Prof. Bodik CS 164 Lecture 517
First example: balanced parens
• Our problem: check the syntax– are parentheses in input string balanced?
• The simple language– parenthesized number literals– Ex.: 3, (4), ((1)), (((2))), etc
• Before we look at the parser– why aren’t finite automata sufficient for this
task?
![Page 18: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/18.jpg)
Prof. Bodik CS 164 Lecture 518
Why can’t DFA/NFA’s find syntax errors?
•When checking balanced parentheses, FA’s can either
– accept all correct (i.e., balanced) programs but also some incorrect ones, or
– reject all incorrect programs but also reject some correct ones.
•Problem: finite state– can’t count parens seen so
far
( )
)
()
( )
)
(
)
)
![Page 19: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/19.jpg)
Prof. Bodik CS 164 Lecture 519
Parser code preliminaries
• Let TOKEN be an enumeration type of tokens:– INT, OPEN, CLOSE, PLUS, TIMES, NUM, LPAR,
RPAR
• Let the global in[] be the input string of tokens
• Let the global next be an index in the token string
![Page 20: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/20.jpg)
Prof. Bodik CS 164 Lecture 520
Parsers use stack to implement infinite state
Balanced parentheses parser:
void Parse() {nextToken = in[next++];if (nextToken == NUM) return;
if (nextToken != LPAR) print(“syntax error”);Parse();if (in[next++] != RPAR) print(“syntax error”);
}
![Page 21: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/21.jpg)
Prof. Bodik CS 164 Lecture 521
Where’s the parse tree constructed?
• In this parser, the parse is given by the call tree:• For the input string (((1))) :
Parse()
Parse()
Parse()
Parse()
1
(
(
( )
)
)
![Page 22: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/22.jpg)
Prof. Bodik CS 164 Lecture 522
Second example: subtraction expressions
The language of this example: 1, 1-2, 1-2-3, (1-2)-3, (2-(3-4)), etc
void Parse() {if (in[++next] == NUM) {
if (in[++next] == MINUS) { Parse(); }} else if (in[next] == LPAR) {
Parse();if (in[++next] != RPAR) print(“syntax error”);
} else print(“syntax error”);}
![Page 23: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/23.jpg)
Prof. Bodik CS 164 Lecture 523
Subtraction expressions continued
• Observations:– a more complex language
• hence, harder to see how the parser works (and if it works correctly at all)
– the parse tree is actually not really what we want• consider input 3-2-1• what’s undesirable about this parse tree’s structure?
-3 Parse()
-
1
2 Parse()
Parse()
![Page 24: Building a Parser I](https://reader035.vdocument.in/reader035/viewer/2022062809/56815861550346895dc5bddc/html5/thumbnails/24.jpg)
Prof. Bodik CS 164 Lecture 524
We need a clean syntactic description
• Just like with the scanner, writing the parser by hand is painful and error-prone– consider adding +, *, / to the last example!
• So, let’s separate the what and the how– what: the syntactic structure, described with a
context-free grammar– how: the parser, which reads the grammar, the
input and produces the parse tree