UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
CSCE 330Programming Language
StructuresChapter 3: Lexical and
Syntactic AnalysisFall 2009
Marco [email protected]
Syntactic sugar causes cancer of the semicolon. A.Perlis
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Contents
• 3.1 Chomsky Hierarchy• 3.2 Lexical Analysis• 3.3 Syntactic Analysis
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
3.1 Chomsky Hierarchy
• Regular grammar -- least powerful• Context-free grammar (BNF)• Context-sensitive grammar• Unrestricted grammar
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Regular Grammar
• Simplest; least powerful• Equivalent to:
– Regular expression– Finite-state automaton
• Right regular grammar: T*, B NA → BA →
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Example
• Integer → 0 Integer | 1 Integer | ... | 9 Integer | 0 | 1 | ... | 9
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Regular Grammars
• Left regular grammar: equivalent• Used in construction of tokenizers
(scanners, lexers)• Less powerful than context-free
grammars• Not a regular language
{ aⁿ bⁿ | n ≥ 1 }i.e., cannot balance: ( ), { }, begin end
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Context-free Grammars
• BNF a stylized form of CFG• Equivalent to a pushdown automaton• For a wide class of unambiguous CFGs,
there are table-driven, linear time parsers
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Context-Sensitive Grammars
• Production:• α → β |α| ≤ |β|• α, β (N T)*• i.e., left-hand side can be composed of
strings of terminals and nonterminals
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Undecidable Properties of CSGs
• Given a string and grammar G: L(G)• L(G) is non-empty• Defn: Undecidable means that you cannot
write a computer program that is guaranteed to halt to decide the question for all L(G).
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Unrestricted Grammar
• Equivalent to:– Turing machine– von Neumann machine– C++, Java
• That is, can compute any computable function.
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Contents
• 3.1 Chomsky Hierarchy• 3.2 Lexical Analysis• 3.3 Syntactic Analysis
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Lexical Analysis
• Purpose: transform program representation
• Input: printable Ascii characters• Output: tokens• Discard: whitespace, comments
• Defn: A token is a logically cohesive sequence of characters representing a single symbol.
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Example Tokens
• Identifiers• Literals: 123, 5.67, 'x', true• Keywords: bool char ...• Operators: + - * / ...• Punctuation: ; , ( ) { }
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Other Sequences
• Whitespace: space tab• Comments
// any-char* end-of-line• End-of-line• End-of-file
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Why a Separate Phase?
• Simpler, faster machine model than parser
• 75% of time spent in lexer for non-optimizing compiler
• Differences in character sets• End of line convention differs
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Regular Expressions
• RegExpr Meaning• x a character x • \x an escaped character,
e.g., \n• { name } a reference to a name• M | N M or N• M N M followed by N• M* zero or more occurrences
of M
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
• RegExpr Meaning• M+ One or more
occurrences of M• M? Zero or one occurrence
of M• [aeiou] the set of vowels• [0-9] the set of digits• . Any single character
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Clite Lexical Syntax
• Category Definition• anyChar [ -~]• Letter [a-zA-Z]• Digit [0-9]• Whitespace [ \t]• Eol \n• Eof \004
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
• Category Definition• Keyword bool | char | else | false |
float |if | int | main | true | while
• Identifier {Letter}({Letter} | {Digit})*
• integerLit {Digit}+• floatLit {Digit}+\.{Digit}+• charLit ‘{anyChar}’
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
• Category Definition• Operator = | || | && | == | != | < | <=
| > | >= | + | - | * | / |! | [ | ]• Separator ; | . | { | } | ( | )• Comment // ({anyChar} |
{Whitespace})* {eol}
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Generators
• Input: usually regular expression• Output: table (slow), code• C/C++: Lex, Flex• Java: JLex
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Finite State Automata
• Set of states: representation – graph nodes
• Input alphabet + unique end symbol• State transition function
Labelled (using alphabet) arcs in graph• Unique start state• One or more final states
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Deterministic FSA
• Defn: A finite state automaton is deterministic if for each state and each input symbol, there is at most one outgoing arc from the state labeled with the input symbol.
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
• A Finite State Automaton for Identifiers
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Definitions
• A configuration on an FSA consists of a state and the remaining input.
• A move consists of traversing the arc exiting the state that corresponds to the leftmost input symbol, thereby consuming it. If no such arc, then:– If no input and state is final, then
accept.– Otherwise, error.
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
• An input is accepted if, starting with the start state, the automaton consumes all the input and halts in a final state.
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Example
• (S, a2i$) ├ (I, 2i$)• ├ (I, i$)• ├ (I, $)• ├ (F, )
• Thus: (S, a2i$) ├* (F, )
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Some Conventions
• Explicit terminator used only for program as a whole, not each token.
• An unlabeled arc represents any other valid input symbol.
• Recognition of a token ends in a final state.
• Recognition of a non-token transitions back to start state.
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
• Recognition of end symbol (end of file)
ends in a final state.• Automaton must be deterministic.
– Drop keywords; handle separately.– Must consider all sequences with a
common prefix together.
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
•
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Lexer Code
• Parser calls lexer whenever it needs a new token.
• Lexer must remember where it left off.• Greedy consumption goes 1 character
too far– peek function– pushback function– no symbol consumed by start state
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
From Design to Code
• private char ch = ‘ ‘;• public Token next ( ) {• do {• switch (ch) {• ...• }• } while (true);• }
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Remarks
• Loop only exited when a token is found
• Loop exited via a return statement.• Variable ch must be global. Initialized
to a space character.• Exact nature of a Token irrelevant to
design.
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Translation Rules
• Traversing an arc from A to B:– If labeled with x: test ch == x– If unlabeled: else/default part of
if/switch. If only arc, no test need be performed.
– Get next character if A is not start state
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
• A node with an arc to itself is a do-while.– Condition corresponds to whichever
arc is labeled.
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
• Otherwise the move is translated to a if/switch:– Each arc is a separate case.– Unlabeled arc is default case.
• A sequence of transitions becomes a sequence of translated statements.
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
• A complex diagram is translated by boxing its components so that each box is one node.– Translate each box using an outside-
in strategy.
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
• private boolean isLetter(char c) {• return ch >= ‘a’ && ch <= ‘z’ ||• ch >= ‘A’ && ch <= ‘Z’;• }
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
• private String concat(String set) {• StringBuffer r = new
StringBuffer(“”);• do {• r.append(ch);• ch = nextChar( );• } while (set.indexOf(ch) >= 0);• return r.toString( );• }
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
• public Token next( ) {• do { if (isLetter(ch) { // ident or keyword• String spelling = concat(letters+digits);• return Token.keyword(spelling);• } else if (isDigit(ch)) { // int or float literal• String number = concat(digits);• if (ch != ‘.’) • return Token.mkIntLiteral(number);• number += concat(digits);• return Token.mkFloatLiteral(number);
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
• } else switch (ch) {• case ‘ ‘: case ‘\t’: case ‘\r’: case eolnCh:• ch = nextCh( ); break;• case eofCh: return Token.eofTok;• case ‘+’: ch = nextChar( );• return Token.plusTok;• …• case ‘&’: check(‘&’); return Token.andTok;• case ‘=‘: return chkOpt(‘=‘, Token.assignTok,• Token.eqeqTok);
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Source Tokens
• // a first program• // with 2 comments• int main ( ) {
char c;int i;c = 'h';i = c + 3;
• } // main
• int• main• (• )• {• char• Identifier c• ;
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
JLex: A Lexical Analyzer Generator for Java
Definition of tokens
Regular Expressions
JLex
Java File: Scanner Class
Recognizes Tokens
We will look at an example JLex specification (adopted from the manual).
Consult the manual for details on how to write your own JLex specifications.
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
The JLex tooluser code (added to start of generated file)
%% options
%{ user code (added inside the scanner class declaration)%} macro definitions
%%
lexical declaration
user code (added to start of generated file)
%% options
%{ user code (added inside the scanner class declaration)%} macro definitions
%%
lexical declaration
Layout of JLex file:
User code is copied directly into the output class
JLex directives allow you to include code in the lexical analysis class, change names of various components, switch on character counting, line counting, manage EOF, etc.
Macro definitions gives names for useful regexps
Regular expression rules define the tokens to be recognised and actions to be taken
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Java.io.StreamTokenizer• An alternative to JLex is to use the class
StreamTokenizer from java.io• The class recognizes 4 types of lexical
elements (tokens):• number (sequence of decimal numbers
eventually starting with the –(minus) sign and/or containing the decimal point)
• word (sequence of characters and digits starting with a character)
• line separator• end of file
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Parsing• Some terminology• Different types of parsing strategies
– bottom up– top down
• Recursive descent parsing– What is it– How to implement one given an EBNF
specification– (How to generate one using tools –
later)• (Bottom up parsing algorithms)
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Parsing: Some Terminology
• RecognitionTo answer the question “does the input conform
to the syntax of the language?”
• ParsingRecognition + determination of phrase structure
(for example by generating AST data structures)
• (Un)ambiguous grammar:A grammar is unambiguous if there is only at
most one way to parse any input (i.e. for syntactically correct program there is precisely one parse tree)
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Different kinds of Parsing Algorithms
• Two big groups of algorithms can be distinguished:– bottom up strategies– top down strategies
• Example parsing of “Micro-English”
Sentence ::= Subject Verb Object .Subject ::= I | a Noun | the Noun Object ::= me | a Noun | the NounNoun ::= cat | mat | ratVerb ::= like | is | see | sees
Sentence ::= Subject Verb Object .Subject ::= I | a Noun | the Noun Object ::= me | a Noun | the NounNoun ::= cat | mat | ratVerb ::= like | is | see | sees
The cat sees the rat.The rat sees me.I like a cat
The rat like me.I see the rat.I sees a rat.
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Top-down parsing
The cat sees a rat .The cat sees rat .
The parse tree is constructed starting at the top (root).
Sentence
Subject Verb Object .
Sentence
Noun
Subject
The
Noun
cat
Verb
sees a
Noun
Object
Noun
rat .
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Bottom up parsing
The cat sees a rat .The cat
Noun
Subject
sees
Verb
a rat
Noun
Object
.
Sentence
The parse tree “grows” from the bottom (leaves) up to the top (root).
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Look-Ahead
Derivation
LL-Analyse (Top-Down)Left-to-Right Left Derivative
Scans string left to rightBuilds leftmost derivation
Look-Ahead
Reduction
LR-Analyse (Bottom-Up)Left-to-Right Right Derivative
Scans string left to rightBuilds rightmost derivation
Top-Down vs. Bottom-Up parsing
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Recursive Descent Parsing
• Recursive descent parsing is a straightforward top-down parsing algorithm.
• We will now look at how to develop a recursive descent parser from an EBNF specification.
• Idea: the parse tree structure corresponds to the “call graph” structure of parsing procedures that call each other recursively.
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Recursive Descent Parsing
Sentence ::= Subject Verb Object .Subject ::= I | a Noun | the Noun Object ::= me | a Noun | the NounNoun ::= cat | mat | ratVerb ::= like | is | see | sees
Sentence ::= Subject Verb Object .Subject ::= I | a Noun | the Noun Object ::= me | a Noun | the NounNoun ::= cat | mat | ratVerb ::= like | is | see | sees
Define a procedure parseN for each non-terminal N
private void parseSentence() ;private void parseSubject();private void parseObject(); private void parseNoun();private void parseVerb();
private void parseSentence() ;private void parseSubject();private void parseObject(); private void parseNoun();private void parseVerb();
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Recursive Descent Parsing
public class MicroEnglishParser {
private TerminalSymbol currentTerminal;
//Auxiliary methods will go here ...
//Parsing methods will go here ...}
public class MicroEnglishParser {
private TerminalSymbol currentTerminal;
//Auxiliary methods will go here ...
//Parsing methods will go here ...}
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Recursive Descent Parsing: Auxiliary Methods
public class MicroEnglishParser {
private TerminalSymbol currentTerminal
private void accept(TerminalSymbol expected) {if (currentTerminal matches expected) currentTerminal = next input terminal ;else report a syntax error
}
...}
public class MicroEnglishParser {
private TerminalSymbol currentTerminal
private void accept(TerminalSymbol expected) {if (currentTerminal matches expected) currentTerminal = next input terminal ;else report a syntax error
}
...}
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Recursive Descent Parsing: Parsing Methods
private void parseSentence() { parseSubject(); parseVerb(); parseObject(); accept(‘.’);}
private void parseSentence() { parseSubject(); parseVerb(); parseObject(); accept(‘.’);}
Sentence ::= Subject Verb Object .Sentence ::= Subject Verb Object .
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Recursive Descent Parsing: Parsing Methods
private void parseSubject() { if (currentTerminal matches ‘I’) accept(‘I’); else if (currentTerminal matches ‘a’) { accept(‘a’); parseNoun(); } else if (currentTerminal matches ‘the’) { accept(‘the’); parseNoun(); } else report a syntax error}
private void parseSubject() { if (currentTerminal matches ‘I’) accept(‘I’); else if (currentTerminal matches ‘a’) { accept(‘a’); parseNoun(); } else if (currentTerminal matches ‘the’) { accept(‘the’); parseNoun(); } else report a syntax error}
Subject ::= I | a Noun | the Noun Subject ::= I | a Noun | the Noun
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Recursive Descent Parsing: Parsing Methods
private void parseNoun() { if (currentTerminal matches ‘cat’) accept(‘cat’); else if (currentTerminal matches ‘mat’) accept(‘mat’); else if (currentTerminal matches ‘rat’) accept(‘rat’); else report a syntax error}
private void parseNoun() { if (currentTerminal matches ‘cat’) accept(‘cat’); else if (currentTerminal matches ‘mat’) accept(‘mat’); else if (currentTerminal matches ‘rat’) accept(‘rat’); else report a syntax error}
Noun ::= cat | mat | ratNoun ::= cat | mat | rat
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Algorithm to convert EBNF into a RD parser
private void parseN() { parse X}
private void parseN() { parse X}
N ::= X N ::= X
• The conversion of an EBNF specification into a Java implementation for a recursive descent parser is so “mechanical” that it can easily be automated!
=> JavaCC “Java Compiler Compiler”• We can describe the algorithm by a set of mechanical rewrite
rules
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Algorithm to convert EBNF into a RD parser
// a dummy statement// a dummy statement
parse parse
parse N where N is a non-terminalparse N where N is a non-terminal
parseN();parseN();
parse t where t is a terminalparse t where t is a terminal
accept(t);accept(t);
parse XYparse XY
parse Xparse Y
parse Xparse Y
UNIVERSITY OF SOUTH CAROLINAUNIVERSITY OF SOUTH CAROLINADepartment of Computer Science and
Engineering
Department of Computer Science and Engineering
Algorithm to convert EBNF into a RD parser
parse X* parse X*
while (currentToken.kind is in starters[X]) { parse X}
while (currentToken.kind is in starters[X]) { parse X}
parse X|Y parse X|Y
switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error }
switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error }