system programming handout#09

8
System Programming Walchand Institute of Tec Aim: Design lexical analyzer fo Theory: Figure 1 depicts the sche performs analysis of the s representation. The secon program, to perform sy processing of the source p Figure 1: Tw The Front End The front end performs Lexical analysis, Syntax analysis a Semantic analysi Each kind of analysis invo 1. Determine valid analysis. 2. Determine the ‘c Sunita chnology, Solapur HANDOUT#09 or tokens: keywords, identifiers, numbe ematic of a two pass language proces source program and reflects its results nd pass reads and analyses the IR, ins ynthesis of the target program. Thi program. wo pass schematic for language proc , and is of the source program. olves the following functions: dity of a source statement from the content’ of a source statement. a M. Dol, CSE Dept Page 1 ers, and operators. ssor. The first pass in the intermediate stead of the source is avoids repeated cessing e viewpoint of the

Upload: sunita-aher

Post on 07-Jan-2017

190 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: SYSTEM PROGRAMMING HANDOUT#09

System Programming

Walchand Institute of Technology

Aim:

Design lexical analyzer for tokens: keywords, identifiers, numbers, and operators

Theory:

Figure 1 depicts the schematic

performs analysis of the source program and reflects its

representation. The second pass reads and analyses the IR, instead

program, to perform synthesis

processing of the source program.

Figure 1: Two pass schematic for la

The Front End

The front end performs

• Lexical analysis,

• Syntax analysis and

• Semantic analysis

Each kind of analysis involves the following functions:

1. Determine validity

analysis.

2. Determine the ‘content’

Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur

HANDOUT#09

Design lexical analyzer for tokens: keywords, identifiers, numbers, and operators

depicts the schematic of a two pass language processor.

the source program and reflects its results in the intermedi

representation. The second pass reads and analyses the IR, instead

program, to perform synthesis of the target program. This avoids repeated

the source program.

Figure 1: Two pass schematic for language processing

Lexical analysis,

Syntax analysis and

Semantic analysis of the source program.

analysis involves the following functions:

Determine validity of a source statement from the viewpoint

Determine the ‘content’ of a source statement.

Sunita M. Dol, CSE Dept

Page 1

Design lexical analyzer for tokens: keywords, identifiers, numbers, and operators.

language processor. The first pass

results in the intermediate

representation. The second pass reads and analyses the IR, instead of the source

the target program. This avoids repeated

nguage processing

a source statement from the viewpoint of the

Page 2: SYSTEM PROGRAMMING HANDOUT#09

System Programming

Walchand Institute of Technology

3. Construct a suitable representation

subsequent analysis functions, or

processor.

The word ‘content’ has different co

analysis.

• In lexical analysis, the content is the lexical class to which each lexical unit

belongs,

• In syntax analysis it is the syntactic structure

In semantic analysis the content is t

statement, it is the sef of

mensionality), while for an imperative statement, it is the sequence

implied by the statement.

Figure: Front en

Output of the front end

The IR produced by the front end consists

1. Tables of information

2. An intermediate code

Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur

Construct a suitable representation of the source statement for use

quent analysis functions, or by the synthesis phase

The word ‘content’ has different connotations in lexical, syntax and semantic

In lexical analysis, the content is the lexical class to which each lexical unit

In syntax analysis it is the syntactic structure of a source statement.

In semantic analysis the content is the meaning of a statement

of attributes of a declared variable (e.g. type, length and di

mensionality), while for an imperative statement, it is the sequence

Figure: Front end of the toy compiler

the front end consists of two components:

information

intermediate code (IC) which is a description of the source program.

Sunita M. Dol, CSE Dept

Page 2

the source statement for use by

the synthesis phase of the language

nnotations in lexical, syntax and semantic

In lexical analysis, the content is the lexical class to which each lexical unit

a source statement.

a statement—for a declaration

a declared variable (e.g. type, length and di-

mensionality), while for an imperative statement, it is the sequence of actions

the source program.

Page 3: SYSTEM PROGRAMMING HANDOUT#09

System Programming Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur Page 3

The Back End

The back end performs

• Memory allocation: Memory allocation is a simple task given the presence

of the symbol table. The memory requirement of an identifier is computed

from its type, length and dimensionality, and memory is allocated to it. The

address of the memory area is entered in the symbol table.

• Code generation: Code generation uses knowledge of the target architecture,

viz. knowledge of instructions and addressing modes in the target computer,

to select the appropriate instructions. The important issues in code

generation are:

o Determine the places where the intermediate results should

be kept, i.e. whether they should be kept in memory

locations or held in machine registers. This is a preparatory

step for code generation.

o Determine which instructions should be used for type

conversion operations.

o Determine which addressing modes should be used for

accessing variables.

Page 4: SYSTEM PROGRAMMING HANDOUT#09

System Programming

Walchand Institute of Technology

Figure : Back end of the toy compiler

Lexical analysis (Scanning)

Lexical analysis identifies the lexical units in a source statement. It then

the units into different lexical classes, e.g. id's, constants, reserved id’s, etc, and

enters them into different tables. Lexical analysis builds a descriptor, called a

token, for each lexical unit. A token contains two fields

in class, class code identifies the class to which a lexical unit belongs;

class is the entry number

Example: The statement a

Input:

TRIAL.CPP

Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur

Figure : Back end of the toy compiler

Lexical analysis (Scanning)

Lexical analysis identifies the lexical units in a source statement. It then

the units into different lexical classes, e.g. id's, constants, reserved id’s, etc, and

enters them into different tables. Lexical analysis builds a descriptor, called a

, for each lexical unit. A token contains two fields—class code

identifies the class to which a lexical unit belongs;

is the entry number of the lexical unit in the relevant table.

statement a := b+i; is represented as the string of

Sunita M. Dol, CSE Dept

Page 4

Figure : Back end of the toy compiler

Lexical analysis identifies the lexical units in a source statement. It then classifies

the units into different lexical classes, e.g. id's, constants, reserved id’s, etc, and

enters them into different tables. Lexical analysis builds a descriptor, called a

class code, and number

identifies the class to which a lexical unit belongs; number in

the lexical unit in the relevant table.

of tokens

Page 5: SYSTEM PROGRAMMING HANDOUT#09

System Programming Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur Page 5

# include< stdio.h>

# include< conio.h>

void main()

{

int num1= 5 , count= 1 , ab= 10 ;

char ch;

printf ( "This is a trial program" ) ;

while ( count ! = num1 )

{

ab = ab* count/ 2 ;

if ( count== 3 )

count= count+ 1 ;

printf ( "AB %d" , ab);

}

}

Output:

Token ID=0 # Special Character

Token ID=1 include Keyword type

Token ID=2 < Special Character

Token ID=3 stdio Keyword type

Token ID=4 . Special Character

Token ID=5 h Identifier type

Token ID=6 > Special Character

Token ID=7 # Special Character

Token ID=8 include Keyword type

Token ID=9 < Special Character

Token ID=10 conio Keyword type

Token ID=11 . Special Character

Token ID=12 h Identifier type

Token ID=13 > Special Character

Page 6: SYSTEM PROGRAMMING HANDOUT#09

System Programming Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur Page 6

Token ID=14 void Keyword type

Token ID=15 main Keyword type

Token ID=16 ( Special Character

Token ID=17 ) Special Character

Token ID=18 { Special Character

Token ID=19 int Keyword type

Token ID=20 num1 Identifier type

Token ID=21 = Operator type

Token ID=22 5 Numeric type

Token ID=23 , Special Character

Token ID=24 count Identifier type

Token ID=25 = Operator type

Token ID=26 1 Numeric type

Token ID=27 , Special Character

Token ID=28 ab Identifier type

Token ID=29 = Operator type

Token ID=30 10 Numeric type

Token ID=31 ; Special Character

Token ID=32 char Keyword type

Token ID=33 ch Identifier type

Token ID=34 ; Special Character

Token ID=35 printf Keyword type

Token ID=36 ( Special Character

Token ID=37 This is a trial program Literal type

Token ID=38 ) Special Character

Token ID=39 ; Special Character

Token ID=40 while Keyword type

Token ID=41 ( Special Character

Token ID=42 count Identifier type

Token ID=43 ! Special Character

Token ID=44 = Operator type

Token ID=45 num1 Identifier type

Token ID=46 ) Special Character

Token ID=47 { Special Character

Token ID=48 ab Identifier type

Page 7: SYSTEM PROGRAMMING HANDOUT#09

System Programming Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur Page 7

Token ID=49 = Operator type

Token ID=50 ab Identifier type

Token ID=51 * Operator type

Token ID=52 count Identifier type

Token ID=53 / Operator type

Token ID=54 2 Numeric type

Token ID=55 ; Special Character

Token ID=56 if Keyword type

Token ID=57 ( Special Character

Token ID=58 count Identifier type

Token ID=59 = Operator type

Token ID=60 = Operator type

Token ID=61 3 Numeric type

Token ID=62 ) Special Character

Token ID=63 count Identifier type

Token ID=64 = Operator type

Token ID=65 count Identifier type

Token ID=66 + Operator type

Token ID=67 1 Numeric type

Token ID=68 ; Special Character

Token ID=69 printf Keyword type

Token ID=70 ( Special Character

Token ID=71 AB %d Literal type

Token ID=72 , Special Character

Token ID=73 ab Identifier type

Token ID=74 ) Special Character

Token ID=75 ; Special Character

Token ID=76 } Special Character

Token ID=77 } Special Character

Conclusion:

� In lexical analysis, the content is the lexical class to which each lexical unit

belongs

Page 8: SYSTEM PROGRAMMING HANDOUT#09

System Programming Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur Page 8

� Lexical analysis identifies the lexical units in a source statement. It then

classifies the units into different lexical classes, e.g. id's, constants, reserved

id’s, etc, and enters them into different tables.

� Lexical analysis builds a descriptor, called a token, for each lexical unit.

� Thus the lexical analyser is implemented in the C-language.