lexical analysis - portalscg.unibe.ch/download/lectures/cc/cc-02-lexical.pdflexical analysis 5 the...
TRANSCRIPT
![Page 1: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/1.jpg)
LexicalAnalysis
Mohammad GhafariSpring 2019
![Page 2: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/2.jpg)
What is a language?
2
The method of human communication, either spoken orwritten, consisting of the use of words in a structured andconventional way.
![Page 3: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/3.jpg)
What is a programming language?
3
The means of communication with machines often written inASCII characters.
![Page 4: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/4.jpg)
We need a “valid” language
Validity breaks down into syntax and semantics. The former isthe arrangement of words, while the latter is the meaning ofwords.
For example:1. The dog the man walks.
2. The dog walks the man.
3. The man walks the dog.
4
![Page 5: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/5.jpg)
Lexical analysis
5
The process of mapping sequences of characters to tokens in a particular language.
x = x + y <ID, x> <EQ> <ID, x> <Plus> <ID, y>
Scanner Parsersource tokens
errors
![Page 6: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/6.jpg)
Typical token types
6
Nontokens are:• comment,• blanks, tabs, and newlines,• etc.
NB. Each reserved world likeif, void, return, etc. has adedicated token.
![Page 7: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/7.jpg)
Regular expressions
We use the regular expressions to specify the grammar of a language.
7
StringsSymbols Language
We can decide whether a string is in the language or not.
![Page 8: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/8.jpg)
Notations
If M and N are the languages, then:
8
Bind tighter
Useful extensions:[abc] means (a|b|c)[d-g] means [defg]
![Page 9: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/9.jpg)
Some examples
9
How about the followings?
ab|c
(a|b)*
aa*bb*
a*(abb*)*(a|)
![Page 10: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/10.jpg)
Principle of longest match
Usually, the scanner should pick the longest possible string asthe next token.
10
Scannerreturn flag != if8;
<ID, flag>
<RETURN>
<NEQ>
<ID, if8>
<SCOLON>
![Page 11: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/11.jpg)
Finite state automata
• A finite automaton has a finite set of states; edges lead fromone state to another, and each edge is labeled with a symbol.One state is the start state, and certain of the states aredistinguished as final states.
• Finite automata are recognizers; they simply say "yes" or "no" about each possible input string.
• They come in two flavors:– Nondeterministic finite automata (NFA)– Deterministic finite automata (DFA)
12
start finala tc
![Page 12: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/12.jpg)
Example
The regular expressions [a-z][a-z0-9]* specifies an identifier.
13
![Page 13: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/13.jpg)
NFA
It is an automaton that has a choice of edges – labeled with the same symbol – to follow out of a state. Or it may have special edges labeled with epsilon that can be followed without eating any symbol from the input.
14
(a|b)*abb
![Page 14: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/14.jpg)
DFA
In this automaton no two edges leaving from the same state are labeled with the same symbol.
15
![Page 15: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/15.jpg)
Converting an NFA to a DFA
16
states a bs0
s1
s2
s3
s0, s1
000
s0
s2
s3
0
states a bs0
{s0, s1}{s0, s2}{s0, s3}
{s0, s1}{s0, s1}{s0, s1}{s0, s1}
s0
{s0, s2}{s0, s3}s0
![Page 16: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/16.jpg)
Example
• Find the corresponding DFA of the following automaton.
• Draw a DFA that accepts the aa*bb* expression.
17
A
B
Ca
ab
a,b
ε
![Page 17: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/17.jpg)
Compute e-closure
Lets define e-closure (T) as the states reachable from every state in set T on e-transitions.
18
push all sates of T onto stack;
initialize e-closure(T) to T;
while(stack is not empty){
pop t from the stack;
for(each state u with an edge from t to u labeled e)
if(u is not in e-closure(T)){
add u to e-closure(T);
push u onto stack;
}
}
![Page 18: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/18.jpg)
The subset construction
Lets define move(T, a) as set of NFA states to which there is a transition on input symbol “a” from some state s in T.
19
while(there is an unmarked state T in Dstates){
mark T;
for(each input symbol a){
U = e-closure(move(T,a));
if (U is not in Dstates)
add U as an unmarked state to Dstates;
Dtran[T,a] = U;
}
}
![Page 19: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/19.jpg)
Example
Apply the subset construction to the following NFA.
20
(a|b)*abb
![Page 20: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/20.jpg)
Example (answer)
21
![Page 21: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/21.jpg)
Lexical analyzer
Each automaton accepts a certain token and the combination of several automata can serve as a lexical analyzer (also know as lexer or scanner).
22
![Page 22: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/22.jpg)
Lexer in practice
23
The lexer must keep track of the longest match seen so far, and the input position of that match.
![Page 23: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/23.jpg)
Example
24
• | the input position at each call to the lexer.
• ⊥ the current position.• T the last final state.
![Page 24: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x](https://reader035.vdocument.in/reader035/viewer/2022071001/5fbe50217e6aa24424208bb0/html5/thumbnails/24.jpg)
Acknowledgement
• Compilers: Principles, Techniques, and Tools by AlfredV.Aho, Monica S. Lam, Ravi Sethi and Jeffrey D. Ullman.
• Modern Compiler Implementation in Java by Andrew W.Appel and Jens Palsberg.
26