topic 2: lexical analysis - uamarantxa.ii.uam.es/~modonnel/compilers/02_lexicalanalysis.pdf1...
TRANSCRIPT
1
Compilers
Topic 2: Lexical Analysis
Mick O´Donnell : [email protected]
2.1. Introduction
2
Introduction
• The Role of the Lexical Analyser
Source
Code
Lexical
Analyser
Syntactic
Analyser
Semantic
Analyser
FRONT END
2
3
• Also known as tokeniser or scanner.
• In Spanish, called analizador Morfológico
• Purpose: translation of the source code into a sequence of symbols.
• The symbols identified by the morphological analyser will be considered terminal symbols in the grammar used by the syntactic analyser.
Lexical Analyser
Introduction
“begin
int A;
A := 100;
A := A+A;
output A
End”
(reserved-word,begin)
(type, int)(<id>,A)(<symb>,;)
(<id>,A)(<mult-symb>,:=)
(<cons int>,100)(<symb>,;)
(<id>,A)(<mult-symb>,:=)
(<id>,A)(<symb>,+)(<id>,A)(<symb>,;)
(reserved-word,output)(<id>,A)
(reserved-word,end)
4
• Other tasks:
• Identification of lexical errors,
• e.g., starting an identifier with a digit where the language does not allow this: 2abc
• Deletion of white-space:
• Usually, the function of white-space is only to separate tokens.
• Exceptions: languages where whitespace indicates code block, e.g., python:
if 1 == 2:
print 1
print 2
• Deletion of comments: not relevant to execution of program.
Lexical Analyser
Introduction
3
5
What are Symbols?
• How do we determine what are the symbols of a given language?
• Case: • Assume we have a language with assignment operator :=• The ‘assignment statement’ has syntax:
STATEMENT → ID ASSIGNOP EXPR ‘;’
• The rule for ASSIGNOP could be:
ASSIGNOP → ‘:=’
…meaning ‘:=’ is a symbol, and thus a unit of lexical analysis.
• However, the rule might have been:
ASSIGNOP → ‘:’ ‘=’
…meaning ‘:’ and ‘=’ are two symbols for lexical analysis
Lexical Analyser
Drawing the border between symbols
A := 1 + 2
6
What are Symbols?
• General Rules:
• A symbol is a sequence of characters that cannot be sepaated from each other by white space.
• Symbols can be separated from other symbols by white space
• With A:=1 + 2
• ‘:=‘ can be separated from ‘A’ and ‘1’
• BUT ‘:’ cannot be separated from ‘=‘
• Thus ‘:=‘ should be treated as a symbol
Lexical Analyser
Drawing the border between symbols
4
7
What token labels to use?
• To determine which token labels we assign to symbols, we first need to derive the syntactic grammar of the language.
• THEN, we extract out the terminal symbols of this grammar, which become the token labels in lexical analysis.
• This ensures that the labels assigned in lexical analysis are what we need in syntactic analysis.
• For example, we might assign the label “reserved_word” to both “begin” and “end”.
• But it is clear we cannot use such a label in parsing:
Program -> reserved_word Statement* reserved_word
• … would allow “end A=1 begin” as a program.
• Each token label has to reflect the different roles that the token class can serve in a program.
Lexical Analyser
Determining the Token set
8
1 : <program> ::= begin <dcl train> ; <stm train> end
2 : <dcl train> ::= <declaration>
3 : | <declaration> ; <dcl train>
4 : <stm train> ::= <statement>
5 : | <statement> ; <stm train>
6 : <declaration>::= <mode> <idlist>
7 : <mode> ::= bool
8 : | int
9 : | ref <mode>
10 : <idlist> ::= <id>
11 : | <id> , <idlist>
12 : <statement> ::= <asgt stm>
13 : | <cond stm>
14 : | <loop stm>
15 : | <transput stm>
15 : | <case stm>
16 : | call <id>
17 : <asgt stm> ::= <id> := <exp>
18 : <cond stm> ::= if <exp> then <stm train> fi
19 : | if <exp> then <stm train> else <stm train>
Lexical Analyser
Identifying the scope of the lexical analysis in the grammar of the language
5
9
20 : <loop stm> ::= while <exp> do <stm train> end
21 : | repeat <stm train> until <exp>
22 : <transput stm> ::= input <id>
23 : | output <exp>
24 : <exp> ::= <factor>
25 : | <exp> + <factor>
26 : | <exp> - <factor>
27 : | - <exp>
28 : <factor> ::= <primary>
29 : | <factor> * <primary>
30 : <primary> ::= <id>
31 : | <constant>
32 : | ( <exp> )
33 : | ( <compare> )
34 : <compare> ::= <exp> = <exp>
35 : | <exp> <= <exp>
36 : | <exp> > <exp>
Lexical Analyser
Identifying the scope of the lexical analysis in the grammar of the language
Topic 2
One and Two Pass Lexical Analysis
6
11
• Identifies symbols and imediately assigns token label to symbol:
Lexical Analyser
One Pass Lexical Analyser
“begin
int A;
A := 100;
A := A+A;
print A
end”
(begin,begin)(type, int) (id,A)
(semic,;) (id,A) (eqsgn,:=)
(int,100)(semic,;) (id,A)
(eqsgn,:=) (id,A) (symb,+) (id,A)
(semic,;)
(reserved-word,output) (id,A)
(end,end)
12
• In a two-pass lexical analyser:
• First pass groups characters into symbols
• Second pass assigns token labels to symbols
Lexical Analyser
Two Pass Lexical Analysis
“begin
int A;
A := 100;
A := A+A;
print A
end”
(begin,begin)(type, int) (id,A)
(semic,;) (id,A) (eqsgn,:=)
(int,100)(semic,;) (id,A)
(eqsgn,:=) (id,A) (symb,+) (id,A)
(semic,;)
(reserved-word,output) (id,A)
(end,end)
“begin” “int”, “A” “;” “A”
“:=” “100” “;” “A” “:=”
“A” “+” “A” “;” “print”
“A” “end”
“begin” “int”, “A” “;” “A”
“:=” “100” “;” “A” “:=”
“A” “+” “A” “;” “print”
“A” “end”
7
13
• Most programming languages are designed such that the code can be segmented into tokens without any knowledge at all of the meaning of the token.
• Simple rules are adhered to:
• White-space ends a symbol
• Multiple white-space ignored
• identifiers contain only alphanumeric chars or _
• identifiers never start with a number
• a symbol starting with a number IS a number: 1, 34, 10.0
• Some chars are always symbol by themselves: } { ; ( ) ,
• mathematical chars can be solo or followed by “=“
• =, >, <, +, -, /, *
• ==, >=, <=, +=, -=, /=, *=
• The first char of the symbol tells us which group it is in
Lexical Analyser
Two Pass Lexical Analysis
14
• Identifier rules:
• Java: Consist of Unicode letter _ $ 0-9Cannot start with 0-9.
• C: Consists of a-z A-Z 0-9 _
Cannot start with 0-9 or _
• Exceptions:
• Lisp:
• Identifier consists of a-Z A-Z 0-9 _ + - * / @ $ = < > . Etc.
• No restriction on starting char
• If char sequence can be interpreted as a number, it is
• Else it is an ‘identifier’
• E.g., ‘1+’ is an ‘identifier’
‘+1’ is a number
8
Topic 3
Methods of Lexical Analysis
16
Three main Approaches:
1) Ad-Hoc Coding : code is written to recognise each type of token.
2) Finite expressions: e.g.,
• float: “[0-9]*.[0-9]+”
• Id: “[a-zA-Z_][a-zA-Z_0-9]*”
3) Context free grammar, e.g.,
Token :- Id | Int | Literal | …
Id :- Alfa | Alfa Id2
Id2 :- Alfa | Digit | Alfa Id2|Digit Id2
…
Lexical Analyser: Using grammars
Approaches to Lexical Analysis
9
Topic 2.1
Ad Hoc Coding of Lexical Analysis: Recognising Symbols
18
• Common approach (1):
• Human writes code to recognise the tokens of the source language:
Lexical Analyser
Two Pass Lexical Analysis with ad-hoc code
def tokenise():
symbolList = []
while not eof():
// process next chars until end of symbol
// add symbol to symbolList
. . .
return symbolList
10
19
Lexical Analyser
def tokenise():
symbolList = []
while not eof():
case type(nextc):
'whitespace': ...
'alpha': ...
'digit': ...
etc.
return symbolList
Two Pass Lexical Analysis with ad-hoc code
20
def tokenise():
symbolList = []
while not eof():
case type(nextc):
'whitespace': ...
'alpha': ...
'digit': ...
etc.
return symbolList
def type (char):
if char in “a-zA-z_”: return ‘alpha’
if char in “0-9”: return ‘digit’
if char in “ \t\n”: return ‘whitespace’
if char in “{};,”: return ‘sepchar’
if char in “><=+-/*”: return ‘mathchar’
Lexical Analyser
11
21
def tokenise():
symbolList = []
while not eof():
case type(nextc):
'alpha': // alpha includes here '_'
symbol = “” + getc()
while type(nextc) in ['alpha', 'digit']:
symbol += getc()
symbolList.append(symbol)
'whitespace': getc()
'digit': ...
...
Lexical Analyser
22
def tokenise():
symbolList = []
while not eof():
case type(nextc):
'alpha': // alpha includes here '_'
symbol = “”+getc()
while type(nextc) in ['alpha', 'digit']:
symbol += getc()
symbolList.append(symbol)
'whitespace': getc()
'digit': ...
...
Lexical Analyser
12
23
. . .
‘mathchar': // = > < + - * /
symbol = “” + getc()
if nextc == '=':
symbol += getc()
symbolList.append(symbol)
'sepchar': // { } ; ,
symbol = “” + getc()
symbolList.append(symbol)
default: print "ERROR: Unknown Char: “+getc()
Lexical Analyser
24
Numbers:
• Formats: 1, 34, 34.001, .0
• Procedure1) Read digits until we reach a nondigit2) If nextchar is “.” then read digits until we reach a nondigit
Lexical Analyser
13
25
Numbers:
• Formats: 1, 34, 34.001, .0
• Procedure1) Read digits until we reach a nondigit2) If nextchar is “.”, then read digits until we reach a nondigit
‘digit': symbol = “”+getc()
while nextc in “0123456789”:
symbol += getc()
if nextc == “.”:
symbol += getc()
while nextc in “0123456789”:
symbol += getc()
symbolList.append(symbol)
Lexical Analyser
Topic 2.2
Ad Hoc Coding of Lexical Analysis: Assigning Token Labels
14
27
• Second Stage: assigning token labels to symbols
1.Reserved words matched by comparision (or hash lookup)
If Symbol in RESERVED_WORDS: Token = symbol
2.Use regular expressions for user-supplied symbols:
• Int : “[0-9]+”
• Float : “[0-9]*.[0-9]+”
• Id : “[a-zA-Z_][a-zA-Z0-9_]*”
Lexical Analyser
Two Pass Lexical Analyser
Topic 2.2
Ad Hoc Coding of Lexical Analysis:
Single Pass Approach
15
29
The code shown earlier was for recognising symbols.
• Symbols of different types were recognised in different parts.
• We can use this to simplify token labelling
• The specific code used to identify the symbol knows if it is a number, alphanumerical, mathematical or separator.
• Thus, we can use this code to assign token label as well
Lexical Analyser
30
def tokenise():
symbolList = []
while not eof():
while nextc in WHITE_SPACE_CHARS: getc()
symbtype = type(nextc)
symbol = “”+getc()
case symbtype:
'alpha': ...
'digit': ...
'sepchar': ...
'mathchar': ...
symbolList.append( [token,symbol] )
return symbolList
Lexical Analyser
Single Pass Lexical Analyser
16
31
def tokenise():
symbolList = []
while not eof():
while nextc in WHITE_SPACE_CHARS: getc()
symbtype = type(nextc)
symbol = “”+getc()
case symbtype:
'alpha': // alpha includes here '_'
while type(nextc) in ['alpha', 'digit']:
symbol += getc()
if symbol in RESERVED_WORDS:
token = symbol
else:
token = 'id'
...
symbolList.append( [token,symbol])
return symbolList
Lexical Analyser
Single Pass Lexical Analyser
Topic 2.3
Lexical Analysis using Regular Expressions
17
33
• The previous section looked at lexical analysis informally, just in terms of a computer program written by hand to recognise the tokens of a language.
• The rules of lexical syntax are only represented implicitly in the code. One has to interpret the code to see that an identifier must start with an alpha char.
• In earlier days, this was sufficient.
• However, there are problems with this approach:
• Portability: a change in syntax requires editing of the source code (it may be better to state the lexical structure in an external data file, requiring no need to edit the source code)
• Difficult to prove that the tokenising code actually conforms to the specification of the language – does it do as it should?
• This section will explore lexical analysis from a more formal perspective
Lexical Analyser: Using grammars
Lexical analysis using regular expressions and grammars
34
• One approach is to describe the tokens in terms of regular expressions
• A program can read in these regular expressions and generate code to perform the tokenisation
• FLEX and LEX
Lexical Analyser
Lexical analysis using regular expressions
18
35
• LEX and YACC (Yet Another Compiler Compiler) are often used to build a compiler quickly
• One does not write the lexical analyser directly, just the patterns to recognise tokens
Lexical Analyser
Standard Compiler Architecture
Lex
Lexical
rules
My
LexAnlyser
Yacc
Syntactic
rules
My
SynAnalyserSource
Code
36
DIGIT [0-9]
ID [a-z][a-z0-9]*
{DIGIT}+ { printf( “ (integer, ‘%s’) ", yytext); }
{DIGIT}+"."{DIGIT}* { printf( “(float, ‘%s’)", yytext);}
if|then|begin|end|procedure|function
{ printf( “(%s,: ‘%s’)", yytext, yytext);}
{ID} printf( “(id, ‘%s’)", yytext );
"+"|"-"|"*"|"/" printf( “(mathop, ‘%s’)", yytext );
[ \t\n]+ /* eat up whitespace */
. printf( "Unrecognized character: %s\n", yytext );
Lexical Analyser
Flex input for a simple tokeniser
19
37
• Flex generates C code to procede char by char through the input text.
• When the generated scanner is run, it analyzes its input looking for stringswhich match any of its patterns.
• If it finds more than one match, it takes the one matching the most text.
� Thus “23.56” will be recognised as float not integer
{DIGIT}+ { printf( “ (integer, ‘%s’) ", yytext); }
{DIGIT}+"."{DIGIT}* { printf( “(float, ‘%s’)", yytext);}
• Once the match is determined, the text corresponding to the match is made available in the global character pointer yytext,
• The action corresponding to the matched pattern is then executed.
• After recognising a token, input scanning for all patterns restarts from thatpoint (any partially matched pattern is discarded).
Lexical Analyser
Flex processing
Topic 2.4
Lexical Analysis using CFGs
(Context Free Grammars)
20
39
Using Grammars for token recognition
• More formally, one can describe the possible tokens of a language using a context-free grammar<Token> ::= <Id> | <ReservedWd> | <Number> | ...
<Id> ::= <Letter> | <Letter><IdC>
<IdC> ::= <Letter> | <Digit> | <Letter><IdC> | <Digit><IdC>
• This has the advantage that both the syntactic description of thelanguage and its lexical description are in the same language.
• Processing of such grammars is however slower than for regular expressions.
Lexical Analyser
Using a CFG
40
Using Grammars for token recognition
• However, some context free grammars can be automaticallytranslated into right-regular grammars, where rules are of twoforms: ( ‘a’ is a terminal; ‘A’ is a nonterminal )
• A → a
• A → aB
• A grammar in such a form can then be represented as a deterministic finite automaton, (DFA) which allows efficientprocessing of the input.
• a DFA is a finite state machine where, for each pair of state and input symbol, there is one and only one transition to a next state.
Lexical Analyser
Definite Finite Automata
21
41
To derive a right regular grammar from full context free grammar:
1. For each rule whose RHS starts with a nonterminal, replace the nonterminal with its expansion(s)
E.g. Number ::= Integer
Integer ::= IntegerSS | - IntegerSS
IntegerSS ::= digit | digit IntegerSS
Number ::= IntegerSS | - IntegerSS
IntegerSS ::= digit | digit IntegerSS
Number ::= digit | digit IntegerSS | - IntegerSS
IntegerSS ::= digit | digit IntegerSS
Lexical Analyser
Deriving a right-regular grammars
42
To derive a right regular grammar from full context free grammar:
1. For each rule whose RHS starts with a nonterminal, replace the nonterminal with its expansion(s)
2.At the end of replacements, eliminate any rule which cannot be reached from the START symbol
e.g., assume grammar has start symbol: Token<Token> ::= <Id> | <ReservedWd> | <Number> | <StringLit> | …
We only preserve nonterminals referenced in this rule or referenced in thenonterminals it contains, etc.
Lexical Analyser
Deriving a right-regular grammars
22
43
Lexical Analyser
<Token> ::= <Id> | <ReservedWd> | <Number> | <StringLit> | <CharLit> | <SS> | <MS>
<Id> ::= <Letter> | <Letter> <IdC>
<IdC> ::= <Letter> | <Digit> | <Letter> <IdC> | <Digit> <IdC>
<Number> ::= <Integer> | <Real>
<Integer> ::= <IntegerSS> | - <IntegerSS>
<IntegerSS> ::= <Digit> | <Digit> <IntegerSS>
<Real> ::= <FixedPoint> | <FixedPoint> <Exponente>
<FixedPoint> ::= <Integer> . <IntegerSS> | . <IntegerSS> | <Integer> .
<Exponent> ::= E <Integer>
<StringLit> ::= "" | " <CharSeq> "
<CharSeq> ::= <Character> | <Character> <CharSeq>
<CharLit> ::= ' <Character> '
<SS> ::= + | - | * | / | = | < | > | ( | ) / Simple Symbol/
<MS> ::= =+ | != | <= | >= | ++ | -- / Multiple Symbol /
<Character> ::= <Letter> | <Digit> | <SS> | ! | . | , | b | \' | \" | \n
<Letter> ::= A | B | ... | Z | a | b | ... | z
<Digit> ::= 0 | 1 | ... | 9
A samle CFG grammar for tokens
44
Lexical Analyser
<Token> ::= <Letter>
| <Letter> <IdC>
| <ReservedWd>
| <Digit>
| <Digit> <IntegerSS>
| - <IntegerSS>
| <Digit>. <IntegerSS>
| <Digit> <IntegerSS> . <IntegerSS>
| - <IntegerSS> . <IntegerSS>
| . <IntegerSS>
| <Digit> . | <Digit> <IntegerSS> .
| - <IntegerSS> .
…
| ""
| " <CharSeq> "
| ' <Character> '
| + | - | * | / | = | < | > | ( | ) | =+ | != | <= | >= | ++ | --
<IdC> ::= <Letter> | <Digit> | <Letter> <IdC> | <Digit> <IdC>
<IntegerSS> ::= <Digit> | <Digit> <IntegerSS>
Same grammar converted to a RRG (almost)
23
45
Lexical Analyser
RRG converted to a DFA
46
Lexical Analyser
A more complex example
24
47
1 : <program> ::= begin <dcl train> ; <stm train> end
2 : <dcl train> ::= <declaration>
3 : | <declaration> ; <dcl train>
4 : <stm train> ::= <statement>
5 : | <statement> ; <stm train>
6 : <declaration>::= <mode> <idlist>
7 : <mode> ::= bool
8 : | int
9 : | ref <mode>
10 : <idlist> ::= <id>
11 : | <id> , <idlist>
12 : <statement> ::= <asgt stm>
13 : | <cond stm>
14 : | <loop stm>
15 : | <transput stm>
15 : | <case stm>
16 : | call <id>
17 : <asgt stm> ::= <id> := <exp>
18 : <cond stm> ::= if <exp> then <stm train> fi
19 : | if <exp> then <stm train> else <stm
train> fi
Lexical Analyser
Complex Grammar of ASPLE (lexical and syntactic)
48
20 : <loop stm> ::= while <exp> do <stm train> end
21 : | repeat <stm train> until <exp>
22 : <transput stm> ::= input <id>
23 : | output <exp>
24 : <exp> ::= <factor>
25 : | <exp> + <factor>
26 : | <exp> - <factor>
27 : | - <exp>
28 : <factor> ::= <primary>
29 : | <factor> * <primary>
30 : <primary> ::= <id>
31 : | <constant>
32 : | ( <exp> )
33 : | ( <compare> )
34 : <compare> ::= <exp> = <exp>
35 : | <exp> <= <exp>
36 : | <exp> > <exp>
Lexical Analyser
Identifying the scope of the lexical analysis in the grammar of the language
25
49
37 : <constant> ::= <Boolean constant>
38 : | <int constant>
39 : <Boolean constant> ::= true
40 : | false
41 : <int constant> ::= <number>
42 : <number> ::= <digit>
43 : | <number> <digit>
44 : <id> ::= <letter>
45 : | <letter><rest id>
46 : <rest id> ::= <alphanumeric>
47 : | <alphanumeric><rest id>
46 : <digit> ::= 0 | 1 | ... | 9
47 : <letter> ::= a | b | ... | z | A | B | ... | Z
48 : <case stm> ::= case ( <expr> ) <constant case train> esac
49 : <constant case train> ::= <constant case>
50 : | <constant case> <constant case train>
51 : <constant case> ::= <int constant> : <stm train>
52 : <alphanumeric> ::= <digit>
53 : | <letter>
54 : <procedures> ::= <procedure> <procedures>
55 : |
56 : <procedure> ::= procedure <id> begin <stm train> end
Lexical Analyser
Identifying the scope of the lexical analysis in the grammar of the language
50
• We create a new Nonterminal to represent the tokens we wish to recognise
<Token> ::= <Id> | <ReservedWd> | <Number> | <StringLit> | <CharLit> | <SS> | <MS>
• We then derive from this a RRG
Lexical Analyser
26
51
<Token> ::= begin | ; | end | bool | int | ref | ,
| call | := | if | then | fi | else
| while | do | repeat | until | input
| output | + | - | * | ( | ) | = | <= | >
| case | esac | : | procedure
| true | false| 0 | ··· | 9 | 0 <int constant> | ···
| 9 <int constant>
| A | ··· | Z | a | ··· | z
| A <rest id> | ··· | Z <rest id>
| a <rest id> | ··· | z <rest id>
<int constant> ::= 0 | ··· | 9 | 0 <int constant> | ···
| 9 <int constant>
<rest id> ::= A | ··· | Z | a | ··· | z | 0 | ··· | 9
| 0 <rest id> | ··· | 9 <rest id>
| A <rest id> | ··· | Z <rest id>
| a <rest id> | ··· | z <rest id>
Lexical Analyser
The Right-regular grammar
52
A,...,Z,
a,...,z
Lexical Analyser
Graph associated to the grammar
SU
int
constant
rest id
λλλλ
0,...,9
0,...,9
A,...,Z,
a,...,z,
0,...,9
begin,end,bool,int,ref,
call,if,then,fi,else,while,do,repeat,
until,input,output,case,esac,
procedure,true,false,0,···,9,A,...,Z,
a,...z,;,,,+, -,*,(,),=,:=,<=,,>,:
A,...,Z,
a,...,z,
0,...,9
0,...,9
27
53
• ASPLE just provides a few data types, but there are others which are also very common:
• Real numbers, e.g. 3.45, .44, –5., 3.45E2, .44E-2, –5.E123
<real>::=<fixed point>
| <fixed point><exponent>
<integer>::=<int constant>
|-<int constant>
<fixed point>::=<integer>.<int constant>
| .<int constant>
| <integer>.
<exponent>::=E<integer>
• Character and strings, e.g. “”, “hello world”, ‘a’,...,‘z’
<literal>::=“” | “<string>”
<character>::=‘<symbol>’
<string> ::= <symbol> | <symbol><string>
Lexical Analyser
Other patterns
Topic 3
Semantic Actions in Lexical Analysis
28
55
• The compiler can delegate some semantic tasks to the lexical analyser:
• Storing the information about the identifiers in the symbols table.
• Calculating the numeric values (in binary code) for each numeric constant.
• etc.
• These tasks vary according to:
• The objectives of the translators / interpreters.
• The division of tasks between the different components in the translator / interpreter.
Lexical analyser: Semantic actions
Previous concepts
56
• These actions are sometimes expressed by inserting actions between the symbols in the rules. For instance:
<id>::= actionf0 <letter> actionf1 actionf2 <rest id> actionf3
<rest id>::= <letter> actionf1 actionf2 <rest id>
| <digit> actionf1 actionf2 <rest id>
| λ
where
• actionf0 might be: initialise a counter
• actionf1 add 1 to the counter
• actionf2 copy the character which has just been recognised inside a buffer.
• actionf3 add to the buffer an end-of-string mark. Check that the number of characters is not higher than the maximum length allowed. If this happens, notify the error. Otherwise, insert the identifier in the symbols table and return a pointer to the element inside the table.
Lexical analyser: Semantic actions
Semantic actions
29
57
• Other example:
<integer>::= actiong0 <int constant>
| actiong0 -<int constant> actiong2
<int constant>::= <digit> actiong1
| <digit> actiong1 <int constant>
• where
• actiong0 can be the initialisation of an integer variable
value←0
• actiong1 performs the calculation of the value of the number which
has been read until now
value←(10*value) + value(digit)
• actiong2 changes the sign of the value calculated:
value← -value
Lexical analyser: Semantic actions
Semantic actions
58
Three main Approaches:
1) Ad-Hoc Coding
2) Finite expressions: e.g.,
• float: “[0-9]*.[0-9]+”
• Id: “[a-zA-Z_][a-zA-Z_0-9]*”
3) Context free grammar, converted to RRG and DFA
Token :- Id | Int | Literal | …
Id :- Alfa | Alfa Id2
Id2 :- Alfa | Digit | Alfa Id2 | Digit Id2
…
Lexical Analyser: Summary
Summary