lexical analysis - lecture 2 sections 3.1 - 3 -...
TRANSCRIPT
![Page 1: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/1.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Lexical AnalysisLecture 2
Sections 3.1 - 3.4
Robb T. Koether
Hampden-Sydney College
Mon, Jan 19, 2009
![Page 2: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/2.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Outline
1 Lexical Analysis
2 Regular Expressions
3 State Diagrams
4 Assignment
![Page 3: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/3.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Tokens
A token has a type and a value.Types include id, num, assign, lparen, etc.Values are used primarily with identifiers and numbers.If we read “count”, the type is id and the value is“count”.If we read “123”, the type is num and the value is“123”.If we read “=”, the type is assign and the value is “=”.
![Page 4: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/4.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Analyzing Tokens
Each type of token can be described by a regularexpression.Therefore, the set of all tokens can be described by aregular expression. (Why?)Regular expressions are accepted by DFAs.Therefore, the set of all tokens can be processed andaccepted by a DFA.
![Page 5: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/5.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Regular Expressions
The set of all regular expressions may be defined in twoparts.The basic part:
ε represents the language {ε}.a represents the language {a} for every a ∈ Σ.Call these languages L(ε) and L(a), respectively.
![Page 6: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/6.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Regular Expressions
The recursive part: Let r and s denote regularexpressions.
r | s represents the language L(r) ∪ L(s).rs represents the language L(r)L(s).r∗ represents the language L(r)∗.
In other wordsL(r | s) = L(r) ∪ L(s).L(rs) = L(r)L(s).L(r∗) = L(r)∗.
![Page 7: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/7.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Example
Example (Identifiers)Identifiers in C++ can be represented by a regularexpression.
r = A | B | · · · | Z | a | b | · · · | zs = 0 | 1 | · · · | 9t = r(r | s)∗
![Page 8: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/8.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Regular Expressions
Definition (Regular definition)
A regular definition of a regular expression is a “grammar” ofthe form
d1 → r1
d2 → r2
...dn → rn
where each ri is a regular expression overΣ ∪ {d1, d2, . . . , di−1}.
![Page 9: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/9.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Regular Expressions
Note that this definition does not allow recursivelydefined tokens.In other words, di cannot be defined in terms of di, noteven indirectly.
![Page 10: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/10.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Example
Example (Identifiers)We may now describe C++ identifiers as follows.
letter → A | B | · · · | Z | a | b | · · · | zdigit → 0 | 1 | · · · | 9
id → letter(letter | digit)∗
![Page 11: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/11.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Lexical Analysis
After writing a regular expression for each kind oftoken, we may combine them into one big regularexpression describing all tokens.
id → letter(letter | digit)∗
num → digit(digit)∗
relop → < | > | == | != | >= | <=token → id | num | relop | . . .
![Page 12: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/12.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
State Diagrams
A regular expression may be represented by a statediagram.The state diagram provides a good guide to writing alexical analyzer program.
![Page 13: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/13.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Example
Example (State Diagrams)
letterletter | digit
digitdigit
id
num
letter
digit
token
digit
letter | digit
![Page 14: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/14.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Diagram Diagrams
Unfortunately, it is not that simple.At what point may we stop in an accepting state?Do not read “count” as 5 identifiers: “c”, “o”, “u”, “n”,“t”.When we stop in an accepting state, we must be able todetermine the type of token processed.Did we read the id token “count” or did we read the iftoken “if”?
![Page 15: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/15.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Example
Example (State Diagrams)
Consider state diagrams to accept relational operators==, !=, <, >, <=, and >=.
=
!
==
!=
<=
=
=
< =
![Page 16: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/16.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Example
Example (State Diagrams)
Combine them into a single state diagram.
= | !
< | >
relop =
=
1
2
3
4
![Page 17: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/17.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
State Diagrams
When we reach an accepting state, how can we tellwhich operator was processed?.In general, we design the diagram so that each kind oftoken has its own accepting state.
![Page 18: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/18.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
State Diagrams
If we reach state 3, how do we decide whether tocontinue to state 4?We read characters until the current character does notmatch any pattern, i.e., it would lead to the dead state.At that point, we accept the string, minus the lastcharacter.Later, processing resumes with the last character.
![Page 19: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/19.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
State Diagrams
The Maximal Munch PrincipleProcess as many symbols as possible and still be able tomatch a regular expression.
![Page 20: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/20.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Example
Example (State Diagrams)
=relop
= other
! = other
< = other
other
> = other
other
![Page 21: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions](https://reader031.vdocument.in/reader031/viewer/2022020303/5af513967f8b9ae9488cf806/html5/thumbnails/21.jpg)
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Assignment
HomeworkRead Sections 3.1 - 3.4.