1 terminology l statement ( 敘述 ) »declaration, assignment containing expression ( 運算式 ) l...
Post on 21-Dec-2015
222 views
TRANSCRIPT
1
Terminology
Statement ( 敘述 )» declaration, assignment containing expression ( 運算式 )
Grammar ( 文法 )» a set of rules specify the form of legal statements
Syntax ( 語法 ) vs. Semantics ( 語意 )» Example: assuming J,K:integer and X,Y:float» I:=J+K vs I:=X+Y
Compilation: 編譯 » matching statements written by the programmer to structure
s defined by the grammar and generating the appropriate object code
2
System Software
Assembler Loader and Linker Macro Processor Compiler Operating System Other System Software
» RDBS» Text Editors» Interactive Debugging System
3
Basic Compiler
Lexical analysis -- scanner» scanning the source statement, recognizing and classifying
the various tokens
Syntactic analysis -- parser» recognizing the statement as some language construct
Code generation --
5
Parser
Grammar: a set of rules » Backus-Naur Form (BNF) » Ex: Figure 5.2
Terminology» Define symbol ::=» Nonterminal symbols <>» Alternative symbols |» Terminal symbols
7
Parser
<read> ::= READ (<id-list>) <id-list>::= id | <id-list>,id
<assign>::= id := <exp> <exp> ::= <term> |
<exp>+<term> | <exp>-<term>
<term>::=<factor> | <term>*<factor> | <term> DIV <factor>
<factor>::= id | int | <exp>
READ(VALUE)
SUM := 0
SUM := SUM + VALUE
MEAN := SUM DIV 100
10
Lexical Analysis
Function» scanning the program to be compiled and recognizing the tokens that make up the source statements
Tokens» Tokens can be keywords, operators, identifiers, integers, flo
ating-point numbers, character strings, etc.» Each token is usually represented by some fixed-length cod
e, such as an integer, rather than as a variable-length character string (see Figure 5.5)
» Token type, Token specifier (value) (see Figure 5.6)
12
Example - Figure 5.6
Statement Token type Token specifierPROGRAM STATS 1
22 ^STATSVAR 2SUM,SUMSQ,I, …, : INTEGER 22 ^SUM
1422 ^SUMSQ
14…..
1422 ^VARIANCE
136
BEGIN 3SUM:=0; 22 ^SUM
1523 #0
13
Token Recognizer
By grammar» <ident> ::= <letter> | <ident> <letter>| <ident><digit>» <letter> ::= A | B | C | D | … | Z» <digit> ::= 0 | 1 | 2 | 3 | … | 9
By scanner - modeling as finite automata» Figure 5.8(a)
14
Recognizing Identifier
Identifiers allowing underscore (_)» Figure 5.8(b)
State A-Z 0-9 _1 22 2 2 33 2 2
1 2A-Z 2
A-Z0-9
3
A-Z0-9
-
15
Recognizing Integer
Allowing leading zeroes» Figure 5.8(c)
Disallowing leading zeroes» Figure 5.8(d)
1 20-9 2
0-9
1 21-9 2
0-9
3
0
4space 4
16
Scanner -- Implementation
Figure 5.10 (a)» Algorithmic code for identifer recognition
Tabular representation of finite automaton for Figure 5.9
State A-Z 0-9 ;,+-*() : = .
1 2 4 5 62 2 2 334 456 77
17
Syntactic Analysis
Recognize source statements as language constructs or build the parse tree for the statements» bottom-up: operator-precedence parsing» top-down:: recursive-descent parsing
18
Operator-Precedence Parsing
Operator» any terminal symbol (or any token)
Precedence» * » +» + « *
Operator-precedence» precedence relations between operators
(i) … id1 := id2 DIV
(ii) … id1 := <N1> DIV int -
(iii) … id1 := <N1> DIV <N2> -
(iv) … id1 := <N3> - id3 *
(v) … id1 := <N3> - <N4> * id4 ;
25
Recursive-Descent Parsing
Each nonterminal symbol in the grammar is associated with a procedure
<read> ::= READ (<id-list>) <stmt> ::= <assign> | <read> | <write> | <for> Left recursion
» <dec-list> ::= <dec> | <dec-list>;<dec>
Modification» <dec-list> ::= <dec> {;<dec>}