self learning material system programming
TRANSCRIPT
Self Learning Material
System Programming(MCA-305A)
Course: Masters in Computer Applications
Semester-III
Distance Education Programme
I.K. Gujral Punjab Technical University
Jalandhar
SyllabusI.K. Gujral Punjab Technical University
MCA-305A System Programming
-----------------------------------------------------------------------------------------------------------Section-A
Assemblers and Macro Processors: Language processors, data structures for language processing,General Design Procedure, Single pass and two pass assembler and their algorithms, assemblylanguage specifications (example MASM). Macro Instructions, Features of Macro Facility: Macroinstruction arguments, Conditional macro expansion, Macro calls within macro.
Section-B
Loaders and Linkers & Editors: Loader Schemes: Compile and go loader, general loader scheme,absolute loaders, subroutine linkages, relocating loaders, direct linking loaders, Relocation, Design ofAbsolute Loader, Bootstrap Loaders, Dynamic Linking, MS-DOS Linker, Text Editors, Line Editor,Steam Editors, Screen editor, Word processors, Structure editors.
Section-CCompiler Design: Introduction to various translators, interpreters, debuggers, various phases of compiler,Introduction to Grammars and finite automata, Bootstrapping for compilers, Lexical Analysisand syntax analysis, Intermediate Code Generation, Code optimization techniques, Code generation,Introduction to YACC, Just-in-time compilers, Platform Independent systems.
Section-DOperating System: OperatingSystemsanditsfunctions,Typesofoperatingsystems:Real-timeOS, DistributedOS, Mobile OS, Network OS, Booting techniques and subroutines, I/O programming, Introduction toDevice Drivers, USB and Plug and Play systems, Systems Programming (API’s).
TEXTBOOKS:• Donovan J.J., Systems Programming, New York, Mc-Graw Hill, 1972.• Leland L. Beck, System Software, San Diego State University, Pearson Education, 1997.• Dhamdhere,D.M.,SystemProgrammingandOperatingSystems,TataMc-GrawHill1996.
REFERENCES:1.Aho A.V.and J.D. Ullman Principles of compiler Design Addison Wesley/Narosa 1985.
Table of Contents
Chapter No. Title Written By Page No.
1 Fundamental of Assembler Mr. Mandeep KumarAssistant Professor,CEM, Kapurthala
1
2 Design Procedure: Assemblers Mr. Mandeep KumarAssistant Professor,CEM, KApurthala
9
3 Assembly Language Ms. Sajpreet KaurAssistant ProfessorDAV University,Jalandhar
24
4 Macro Instructions Ms. Sajpreet KaurAssistant ProfessorDAV University,Jalandhar
34
5 Loaders Mr. Manpreet SinghAssistant Professor,CTIT, Shahpur,Jalandhar
46
6 Linkers Mr. Manpreet SinghAssistant Professor,CTIT, Shahpur,Jalandhar
54
7 Editors Mr. Manpreet SinghAssistant Professor,CTIT, Shahpur,Jalandhar
64
8 Fundamentals of Compiler Design Mr. Abhinav HansAssistant Professor,DAVIET, Jalandhar
72
9 Finite automata and grammar Mr. Abhinav HansAssistant Professor,DAVIET, Jalandhar
86
10 Phases of Compiler Design Mr. Abhinav HansAssistant Professor,DAVIET, Jalandhar
110
11 YACC Mr. Abhinav HansAssistant Professor,DAVIET, Jalandhar
134
12 Fundamentals of OS Mr. Sandeep Sood
GNDU, Amritsar
152
13 Booting Techniques and Device Drivers Mr. Sandeep SoodAssistant Professor,GNDU, Amritsar
165
14 System Programming API Mr. Sandeep SoodAssistant Professor,GNDU, Amritsar
178
Reviewed byMr. Palvinder Singh Mann
Assistant ProfessorDAVIET
© IK Gujral Punjab Technical University JalandharAll rights reserved with IK Gujral Punjab Technical University
Jalandhar
1
Lesson 1 Fundamental of Assembler
Structure of the Chapter
1.0 Objective
1.1 Introduction
1.2 Assembler
1.3 Macro Processor
1.4 Language Processor
1.5 Data Structure used in Language Processor
1.6 Summary
1.7 Glossary
1.8 Answers to check your progress/self assessment questions
1.9 References/ Suggested Readings
1.10 Model Questions
1.0 Objective
After studying this chapter students will be able to:
Explain assembler and MACRO Processors. Describe Language Processor Describe different data structure used in language processor.
1.1 Introduction
Assembler is one of the fundamental components of any processing system. In this chapter,assembler, MACRO processor, Language processor and data structures used in languageprocessor will be discussed.
1.2 Assembler
An assembler is a computer program that interprets software program written in assemblylanguage into machine language that can be executed by a computer. An assembler is referred asthe compiler of assembly language. Assembly language is different to specific computerarchitecture.
Typically, assemblers make two passes over the assembly language code as:
2
1. First Pass: read each line and record labels in a symbol table.2. Second Pass: use information in a symbol table to produce actual machine code for each
line.
Further, we will have a complete chapter on Assembler to discuss it in detail.
Fig 1.1 Role of Assembler
1.3 Macro Processor
A macro is a commonly used group of statements in the source programming language. Amacro instruction (or macro) is a convenience for the programmers use. It allows programmer towrite short version of a program (module programming).The macro processor is a program thatreplaces each macro instruction with the corresponding group of source language, although Itperform no analysis on the program.
Basic Macro Processor Functions: Design of macro processor is machine independent.TwoAssembler directives are used in macro definition.
1. MACRO: - Shows beginning of macro definition.2. MEND: -Shows end of macro definition.
Syntax for macro:
Each parameter start with ‘&’NAME MACRO parameters
.
.Body
Assembly Code
Assembler
Machine Code
3
.
.MEND
Example: -
SOURCE PROGRAM:-
AS MACROSTA D1STB D2MEND
.AS.AS.
AS is a macro with no argument.The MACRO stores the contents of register A in D1 and thecontents of register B in D2.
EXTENDED SOURCE PROGRAM: -..STA D1STB D2.STA D1STB D2
Further we will have a complete chapter on MACRO PROCESSOR to discuss it in detail.
Check Your Progress/ Self assessment Questions
Q1 What is the need of Assembler ?
………………………………………………………………………………………………………
…………………………………………………………………………………………………….
……………………………………………………………………………………………………..
Q2.________generation computers use assembly language
a. First generationb. Third generationc. Second generation
4
d. Fourth generation
Q3. Assembler works to convert assembly language program into machine language :a. Before the computer can execute itb. After the computer can execute itc. In between executiond. All of these
Q4. Assembly language program is called
a. Object programb. Source programc. Oriented programd. All of these
1.4 Language ProcessorLanguage processing activities comes into play due to the differences in manner by which asoftware designer describes the ideas concerning the behavior of a software and a manner bywhich these ideas are implemented in a computer system. For example, if we write a code inC++ language then we need something that convert the code into machine readable text and thatcan be done by intermediate called C++ compiler i.e. a Language Processor.
Types of Language Processors are: -
1. Language Translator: -These type of language processor convert the higher level languageto a machine level language. Example of these type of processor are Assembler andCompliers.Some examples of translators are:(a)English to French interpreter: This program translates English to French. Source languageremains same in both the cases.(b)Java to C translator: this program translates java program source code to c languagesource code.
2. Detranslator: -It is a type of language processor that takes object code at low level andregenerates the source code at a higher level. Example of this is Dissembler.
3. Preprocessor: -It performs a simple text substitution before translation takes place. Exampleof this type of processor is Macro, Which we studied in previous topic. In every systemspecific language like C, C++ etc. have preprocessor to directly process its variables andfunctions.
5
1.5 Data Structure for data processing by Assembler
Data Structure: Firstly, we should know what data structure is before starting this topic. DataStructure is a way of organizing the data so that it can be used efficiently.
The various type of data structures used by assembler are:
1. Symbol Table (SYSTAB)2. Literal Table (LITTAB)3. Mnemonics Table or Machine operation Table (MOT)4. Pseudo- Opcode Table (POT) or Operation Code Table (OPTAB)5. Location Counter6. Pool Table (POOLTAB)
1. Symbol Table (SYSTAB): - Symbol table is a table that contains all the symbol used inthe program and also store their location and value of that symbol if any. Symbols used tostore in the SYMTAB includes variables, labels etc. It also contains flag which indicateerror messages.
During Pass 1: labels and variables are entered into the symbol table along with theirassigned address value as they encountered. All the symbols addresses and their valuesshould get resolved at the pass 1.
During Pass 2: Symbols used as operands are checked up the symbol table to get theaddress value to be inserted in the assembled instructions.
SYMTAB
SYMBOL VALUE LOCATION LENGTH
2. Literal Table (LITTAB): - Literal Table store all the information of literal used in theassembly program. Literal table includes name and the location assigned to that particularliteral. LITTAB is also created by analysis phase and used by synthesis phase to generatemachine code.
3. Mnemonic Table: - Mnemonic table is also known as machine operation table(MOT).Mnemonic table content remains same during the lifetime of the assembler. Noone can make changes in the mnemonic table. It remains fixed. When we use mnemoniccode then assembler fits its opcode value in conversion to machine language. Example ofmnemonics are ADD, SUB, MUL etc. and in MOT its opcode value and length is definedwhich help it to use easily and effectively.
6
4. Pseudo – Opcode Table (POT) or Operation Code Table (OPTAB): -It is also a fixedtable and remains same during lifetime of assembler, same as mnemonic. OPTABcontains 3 fields: Mnemonic opcode, class and mnemonic information.
Mnemonic opcode is same as mnemonic table. Class field indicates whether the opcoderelated to an imperative statement (IS), a declarative statement (DS) or an assemblerdirective (AD).
Location Counter: - Location counter is a stack which keep track of the presentinstruction location and it is a variable which is used to initialize the address of theprogram. Example,
START 100
Now, LC has value 100 in it means, starting address of program is 100.
5. Pool Table (POOLTAB): - When in a program multiple LTORG statements are usedthen assembler creates a different pool for each LTORG statement. POOLTAB containsinformation regarding different literal pools. Number of pool table value is stored inn_pool variable.
Check Your Progress/ Self assessment Questions
Q5 What is the use of Literal Table?
………………………………………………………………………………………………………
…………………………………………………………………………………………………….
……………………………………………………………………………………………………..
Q6 Mnemonic refers to:
a. Instructionsb. Codec. Symbolic codesd. Assembler
Q7 A_______ processor controls repetitious writing of sequence:
a. Macrob. Microc. Nanod. All of these
7
1.5 Summary
An assembler is a computer program that interprets software program written in assemblylanguage into machine language that can be executed by a computer.
A Macro is a commonly used group of statements in the source programming language. The macro processor is a program that replaces each macro instruction with the
corresponding group of source language. It performs no analysis of program. Language Translator converts the higher level language to machine level language. Detranslator: -It is type of language processor that takes object code at low level and
regenerates the source code at a higher level. Preprocessor: -It performs a simple text substitution before translation takes place.
1.7 Glossary
The important terms discussed in this chapter are:
Assembler: It converts the code written in assembly language into machine language. Assembly language: It is low level programming language which is machine dependent. MACRO:A Macro is a commonly used group of statements in the source programming
language. MACRO Processor: The macro processor is a program that replaces each macro
instruction with the corresponding group of source language. It performs no analysis ofprogram.
Language Processor: Language Processor just act like a bridge between two differentlanguages and make them communicate.
Data Structure: It is a way of arranging and storing the data so that is can be retrieved inan efficient way.
1.8 Answers to Self Assessment Questions
1. It is a translator that converts the code written in assembly language into machine language.
2. C second generation
3. a) Before the computer can execute it
4. b)Source program
5. Literal Table store all the information of literal used in the assembly program. Literal table
includes name and the location assigned to that particular literal. LITTAB is also created by
analysis phase and used by synthesis phase to generate machine code.
6. c) Symbolic codes
7. a) Macro
8
1.9 References
1. System Software, Charanjeet Singh, Kalyani publishers.
2. System Programming, Donovan, 2nd Edition, Tata McGraw-Hill Education.
3. Systems Programming, D M Dhamdhere, Tata McGraw-Hill Education, 2011.
1.10 Model Questions
Q1. How many passes are there in Assembler, elaborate?
Q2. What is MACRO PROCESSOR and what are its functions?
Q3. What are different data Structure used in language processor?
Q4. What is language processor and what are its types?
9
Design Procedure: Assemblers
2.0 Objective
2.1 Introduction
2.2 Assemblers
2.3 Assembly Language
2.4 Format of Assembly Language
2.5 Constants and literals
2.6 Assembly Scheme
2.6.1 Phases of Assembler
2.7 One Pass Assembler
2.8 Two Pass Assembler
2.9 Summary
2.10 Glossary
2.11 Answers to check your progress/self assessment questions
2.12 References/ Suggested Readings
2.13Model Questions
2.0 Objective
After studying this chapter, students will be able to:
Write assembly language statements Learn how Assembler works Learn how to design their own assembler
2.1 Introduction
In this chapter assembler, assembly language, instructions writing in assembly language, howassembler works and how to design one’s own assembler will be discussed. Some algorithm thatwe can use to create assemblers will also be discussed.
2.2 AssemblerLet us first consider an example to understand the term assembler. Let us suppose thereare two persons, one can only speak English and other can only speak Spanish and theywant to communicate with each other. Now the problem is no one can understand whatother is saying. So in order to solve this problem they include a third person who can
10
understand English as well as Spanish so that he can translate their languages. Same caseis with our computer and programming languages. Our computer can only understandbinary language whereas it is too hard for us to understand. We can understand Englishlanguage very well but it is not known to our computer. So we need some translator andhere the role of assembler comes into play. Assembler takes our code written in assemblylanguage (Source Program) convert it into binary language for computer so thatcomputer can understand it. The output produced by computer is in binary languageassembler take it from computer, convert it into assembly language and give us the outputso that we can understand it.
Fig 2.1 Working of Assembler
2.2.1 Tasks performed by Assembler:-
Translate mnemonic code in binary language. Notify errors if present in the source code. Produce information for linkers and loaders. Assign machine addresses to all the labels used in the program.
2.3 Assembly LanguageIt is a low level programming language which is machine dependent. It consist of numberof mnemonics that represent what operation is to be performed. Eg. ADD, this mnemonicrepresent that the programmer wants to perform addition. Machine can’t understandassembly language directly. Assembler is required to convert assembly language in tobinary language and vice versa.
11
2.4 Format of assembly language:-
An assembly language statement generally consists of 4 components.
[Label] <opcode><operand specification> [<operand specification>…..]
Description of various components are:1. […] Optional Field2. Label Name given to memory location3. Opcode Mnemonic code4. Operand SpecificationOperand type like register, memory etc.
Note: First operand of assembly language statement is always a register like AREG,BREG, CREG etc.
Second operand of assembly language can be register, memory or immediate value etc.
2.4.1 Various mnemonic codes, their meaning, opcode value and example:
Mnemonic Opcode Value Meaning Example
STOP 00 Stops the execution STOP
ADD 01 Addition ADD AREG ONE
SUB 02 Subtraction SUB AREG ONE
MULT 03 Multiplication MULT AREG ONE
MOVER 04 Move memory content toregister
MOVER AREGONE
MOVEM 05 Move register content tomemory
MOVEM AREGONE
COMP 06 Comparison instruction set code COMP AREG ONE
BC 07 Branch on Condition BC AREG LT ONE2000
DIV 08 Division DIV AREG ONE
READ 09 To read data READ ONE
PRINT 10 To Print data PRINT ONE
2.4.2 Condition Code Table
Code Mnemonics Meaning
12
1 LT Less Than
2 LE Less than Equal to
3 EQ Equal To
4 GT Greater Than
5 GE Greater than or Equal to
6 ANY Unconditional control transfer
2.4.3 Machine Instruction FormatAs we know assembly language statement has three parts opcode first operand andsecond operand. Similarly, Machine language statement also has three parts. In machinelanguage, opcode occupies 2 spaces, register occupies 1 and memory occupies 3 spaces.
2.4.4 Type of Assembly Language Statements
Assembly Language statements are of three types:
Imperative Statements:1. Imperative statements are those statements that defines to the processor that what
operation is to be performed.2. Example of Imperative statements are ADD, SUB, MOVER etc.
Declarative Statements:1. These statements are either used to declare a storage or to declare a constant.2. There are only two declarative statements in Assembly Language i.e. DS and DC.
DS: DS Stand for Declare Storage. As the name describes, this statement is to reserve ablock of memory. In actual this statement just put a label on the particular memorylocation that you provides. For example,
A DS 10
This statement will reserve a block of 10 memory words with the label A. On accessingA directly you will access first word of the reserved 10 words. Other words can beaccessed by using offsets. Like to access 6th word you need to write A+5.
DC: DC stands for Declare Constant. This statement reserves a word in the memory andput assign a constant value to it. To use the stored constant value into our program, weprovide a label with it.
13
TWO DC ‘2’
You can store constants in decimal, hexadecimal or in any other form. Assembler willconvert it into binary form and then put it on memory.
Assembler Directives:As the name describes, Assembler Directive statements are used to direct the assembler toperform some particular tasks during translation of the source code into destination code. Nomemory space is reserved for the assembler directives .Some of the assembler directives areSTART, END etc.
Check Your Progress/ Self assessment Questions
Q1 Discuss the format of Assembly Language.
………………………………………………………………………………………………
………………………………………………………………………………………………..Q2 Why are Declarative Statements used ?
………………………………………………………………………………………………
………………………………………………………………………………………………..
Q3 Assembler Directives statements are used to..............................................................
2.5 Constants and Literals
As we have previously discussed, DC means ‘Declare Constant’. Whereas in actualimplementation, we do not declare constant, we actually gave a label to memory locationand put value in that particular memory space. Its value can be changed as per as user’srequirement. To understand this we can take example of variables in C language. Like wecan declare a variable in C in this way
int a=5;Its value can change with user’s requirements.Assembly language can take constant in two ways:
1. Immediate Operands2. Literals
An example of immediate operand isAdd AREG,5
A literal is an operand with syntax:=’<Value>’
A literal is different from a constant in two ways:1. Its value can’t be changed during program execution.
14
2. As value of literal can’t be changed so it is more secured than constant.Literal is identified with prefix =.
2.5.1 Difference between literal and constant
Literal Constant
1.) A literal is an operand specified with ‘=’sign.
1.) A constant is an immediate operand.
2.) Whenever a literal is found in theprogram, Assembler first allocate a memoryspace, add label to that memory space, andthen, put the value in it.
2.) No such arrangement is made for theconstant in assembly language.
3.) The value of literal can’t be changedduring the execution of the program.
3.) The value of the constant can be changedduring the execution of the program.
4.) Literal is the most secured thing in theassembly language.
4.) Constant is not secured as its value canbe changed at any time of the programexecution.
2.6Assembly Scheme
The design procedure of an assembler includes some steps that one need to follow inorder to design a new assembler. These steps are:1. Specify the problem.2. Specify the data structures to be used.3. Define the format of the data structures.4. Specify the algorithm that will obtain and maintain information.
2.6.1 Phases of Assembler
There are two phases in which an assembler works:1.) Analysis Phase2.) Synthesis Phase
1. Analysis Phase
15
The main tasks or jobs of Analysis phase are to create symbol table and literal table.Symbol table is one which stores all the symbols and Literal table is one that stores all theliterals present in the program or code. Symbol table stores symbols as well as addresses.Analysis phase also perform memory allocation task. For memory allocation, it uses adata structure called Location Counter(LC).The Location Counter always contain thenext memory word in the program. Initially it contains value specified by STARTstatement. Whenever a new label is encountered, it is stored in the symbol table. In orderto update the contents of LC, analysis phase needs to the size of each statement. To getthe size of each statement, analysis phase contact with mnemonic table. This process ofmaintaining the location counter is known as LC Processing.Tasks performed by Analysis phase are:1. Differentiate label, mnemonics and operands.2. If a label is present in the statement then it is put into the symbol table.3. It checks the validity of the mnemonics through mnemonic table.4. It performs LC processing.
2. Synthesis Phase
The main task of this phase is to generate equivalent machine code for given assemblycode. In generating machine code it uses some data structures like SYMTAB (SymbolTable), LITTAB (Literal Table) etc. This phase obtains machine code of the symbolgenerated by analysis phase. Address of literal used in program is obtained fromLITTAB. Opcode for mnemonics are obtained from mnemonic table.Tasks performed by synthesis phase are:1.) Obtain addresses of mnemonics, symbols and literals from their respective data
structure tables.2.) Generate machine code.
2.7 One pass Assembler
It is also known as single pass assembler as it scans the input file only once. It is muchfaster than two pass assembler, as two pass assembler scans the file two times. It creates all thedata structures like SYMTAB, LITTAB etc. It also performs LC processing. There exists aproblem in this assembler that is Forward Referencing. As the programs are scanned only once,there exist some symbols which are used earlier in the program and defined later. The problemwith symbol table that arises because of forward referencing is that entry of symbol cannot bemade in to the symbol table until its address is not known.
The solution for Forward Referencing is back patching. In the process of back patching,an additional data structure is required i.e. TII (Table of Incomplete Instructions). All thesymbols whose addresses are not known are left blank initially in TII. Later on when the symbolis defined in the program its address is stored in the TII. After the complete scan of the program,contents stored in the TII are shifted to SYMTAB.
16
2.8 Two Pass Assembler
As the name describes, two pass assembler uses two phases to scan the whole code written in theassembly language. These two phases are generally referred as Pass 1 and Pass 2. The detaileddescription of the working of Pass 1 and Pass 2 are given below.
2.8.1 Pass 1
The main job or task of Pass 1 of assemble is to assign a memory location to each of thestatement or instruction of the program. It is responsible for generation of SYMTAB, LITTABand IC (Intermediate Code). In this phase, initially value of LC is set to 0, then its value is stepby step incremented on the scanning of instructions. If a symbol is discovered in the scanningthen it is stored in to the SYMTAB data structure, If a literal is discovered, then its information isstored LITTAB data structure. Various mnemonics are converted into their relevant opcodes byusing OPCODE data structures. When the END statement is discovered then the first phase ofassembler is said to be complete and the Intermediate Code of the program is produced as output,which in further will act as an input for the Pass 2 of assembler.
2.8.2 Algorithm for Pass 1:
beginif starting address is given
LOCCTR = starting address;else
LOCCTR = 0;while OPCODE != END do ;; or EOF
beginread a line from the codeif there is a label
if this label is in SYMTAB, then errorelse insert (label, LOCCTR) into SYMTAB
search OPTAB for the op codeif found
LOCCTR += N ;; N is the length of this instruction (4 for MIPS)else if this is an assembly directive
update LOCCTR as directedelse errorwrite line to intermediate fileend
program size = LOCCTR - starting address;end
17
2.8. Flowchart for Pass 1:
Fig 2.2 Flowchart for Pass1
18
2.8.4 Intermediate code:
Intermediate code is processed form of the source code written in assembly language generatedby pass 1 of two pass assembler. This is submitted to pass 2 of the assembler as an input. Theintermediate code generated consists of some blocks that contain 3 parts:
1.) Address2.) Mnemonic opcode3.) Operands
Mnemonic opcode field further contains two things:
1.) Statement class like Imperative statement or assembler redirective2.) Opcode
2.8.5 Pass 2:
Pass 2 phase of assembler is responsible for generating the machine equivalent code forthe assembly language code. For the code generation purpose, it uses all the data structures andthe Intermediate Code that is generated by Pass 1 phase of assembler. Initially the locationcounter is again set to 0 as it was done in Pass 1. Then, pass 2 starts reading the code blocks ofintermediate code one by one. If any Assembler Directive is encountered then the value ofLocation Counter is set according to the memory address written in that particular statement. Ifany Imperative statement is encountered, then the value of location counter is updatedaccordingly. If the operand of the statement is symbol, then the symbol is searched in theSYMTAB data structures. If the operand is a literal, then it is search in LITTAB data structures.
2.8.6 Algorithm for Pass 2:
beginread a line;if op code = START then ;;
write header record;while op code != END do ;; or EOF
beginsearch OPTAB for the op code;if found
if the operand is a symbol thenreplace it with an address using SYMTAB;
assemble the object code;else if is a defined directive
convert it to object code;add object code to the text;read next line;end
write End record to the text;output text;
19
end
2.8.7 Flowchart for Pass 2:
20
Fig 2.3 Flowchart for Pass 2
Check Your Progress/ Self assessment Questions
21
Q4. Differentiate between literal and constant.
Q5.The assembler in first pass reads the program to collect symbols defined with offsets in atable_______:
a. Hash tableb. Symbol tablec. Both a& bd. None of these
Q6. In second pass, assembler creates _______in binary format for every instruction inprogram and then refers to the symbol table to giving every symbol an______ relating thesegment.
a. Code and programb. Program and instructionc. Code and offsetd. All of these
Q7 The Different phases of assembler are……………….. and…………………….
2.9 Summary The assembler converts the code written in assembly language into machine language It is a low level programming language which is machine dependent. It consist of number
of mnemonics that represent what operation is to be perform. An assembly language statement generally consists of 4 components.
[Label] <opcode><operand specification> [<operand specification>…..]
There are two phases in which an assembler works:1) Analysis Phase: The main tasks or jobs of Analysis phase are to create symbol
table and literal table. Symbol table is one which stores all the symbols andLiteral table is one that stores all the literals present in the program or code.
2) Synthesis Phase: The main task of this phase is to generate equivalent machinecode for given assembly code. In generating machine code it uses some datastructures like SYMTAB (Symbol Table), LITTAB (Literal Table) etc
Single pass assembler scans the input file only once. It is much faster than two passassembler as two pass assembler scans the file two times. It creates all the data structureslike SYMTAB, LITTAB etc.
Two pass assembler uses two phases to scan the whole code written in the assemblylanguage. These two phases are generally referred as Pass 1 and Pass 2.
22
1) Pass1: The main job or task of Pass 1 of assembles is to assign a memory location toeach of the statement or instruction of the program. It is responsible for generation ofSYMTAB, LITTAB and IC (Intermediate Code
2) Pass 2 phase of assembler is responsible for generating the machine equivalent codefor the assembly language code.).
2.10 Glossary
Assembler: It converts the code written in assembly language into machine language. Assembly language: It is low level programming language which is machine dependent. Imperative Statements: Imperative statements are those statements that defines to the
processor that what operation is to be performed. Declarative Statements: These statements are either used to declare a storage or to
declare a constant. Assembler Directives: statements are used to direct the assembler to perform some
particular tasks during translation of the source code into destination code.
2.11 ANSWERS TO SELF ASSESSMENT QUESTIONS
1. An assembly language statement generally consist of 4 components.
[Label] <opcode><operand specification> [<operand specification>…..]
2. These statements are either used to declare storage or to declare a constant. There are only twodeclarative statements in Assembly Language i.e. DS and DC. DS Stand for Declare Storagewhereas DC stands for Declare constant.
3. Direct the assembler to perform some particular tasks during translation of the source codeinto destination code.
4. See topic 2.5.1
5. B)Symbol Table
6. Code and Offset
7. Synthesis and Analysis
2.12 REFERENCES
1. System Software, Charanjeet Singh, Kalyani publishers
2. System Programming, Donovan, 2nd Edition, Tata McGraw-Hill Education.
23
3. Systems Programming, D M Dhamdhere, Tata McGraw-Hill Education, 2011
2.13 MODEL QUESTIONS
Q1 What is assembler and how it works?
Q2 Elaborate the statement format in assembly language? Explain the use of assembly language?
Q3 Explain the uses of Pass 1 phase of assembler?
Q4 Explain the uses of Pass 2 phase of assembler?
Q5 Compare Single pass and two pass assembler?
24
Assembly Language
3.0 Objective
3.1 Introduction
3.2 Assembly language and assembler
3.3 Assembly language program to its mnemonic equivalent code
3.4 MASM
3.5 Using MASM in Visual C++ 2010
3.6 Programs in assembly language
3.7 Summary
3.8 Glossary
3.8 Answers to self-assessment questions
3.9 References
3.10 Model Questions
3.0 Objective
After studying this chapter you will be able to:
Write you own program in assembly language Understand prewritten assembly language programs Have knowledge of latest assemblers
3.1 Introduction
In this chapter, we will study about how to write program in assembly language, how theseprograms are decoded by assemblers and converted into machine language and then how resultsare converted back in to the assembly language. In the end, we will learn about some latestassemblers like MASM etc.
3.2 Assembly language and Assembler
As we already have discussed assembly language and assembler in previous chapters, now weshall move further and talk about few more things. Before moving further let’s just take a look atthe working of assembler and how the conversion is done from assembly language to machinelanguage and vice versa. Let’s just consider below diagram which is elaborating the working ofassembler on assembly language.
25
Fig. 3.1 Working of Assembler
As we can see in the given picture, assembler is converting assembly language into machinelanguage and vice versa using the database it has and always provide the result in assemblylanguage so that user can understand it. As we have discussed in our previous chapter theconcepts like assembly language format, mnemonic codes for the various assembly languagestatements etc. now we will move further and see how a program is converted into mnemoniccodes and then into binary codes.
3.3 Assembly language program to its mnemonic equivalent code
Before we start learn how a program written in assembly language is converted to its equivalentmnemonic code, we need to learn few things. The first statement of every program will be startstatement which will be followed by a memory location. That memory location defines the firststatement of the program or defines the location from where a program begins. After that eachstatement will get one block of memory. For our ease to understand, we always keep 200 as ouraddress of our first statement.
To understand the conversion, we will take a factorial calculator program in assembly languageand will find its mnemonic equivalent value.
Assembly language program Mnemonic code program
Label Instruction Operands MemoryLocation
Opcode Operands
Register Memory
START 200
READ N 200 09 0 212
MOVER BREG,ONE 201 04 2 233
26
MOVEM BREG,TERM 202 05 2 234
AGAIN MULT BREG,TERM 203 03 2 234
MOVER CREG,TERM 204 04 3 234
ADD CREG,ONE 205 01 3 233
MOVEM CREG,TERM 206 05 3 234
COMP CREG,N 207 06 3 212
BC LE, AGAIN 208 07 2 203
MOVEM BREG,RESULT 209 05 2 213
PRINT RESULT 210 10 0 213
STOP 211 00 0 000
N DS 1 212
RESULT DS 20 213
ONE DC ‘1’ 233 00 0 001
TERM DS 1 234
END
Here, as the given example shows, instructions and operands are converted according to theirrelevant mnemonic opcodes and memory location is always incremented by one block. So this isnot a tough part to understand. All the values of mnemonic code values were given in theprevious chapter, all conversions will be done on their basis.
Check Your Progress/ Self-assessment Questions
Q1. What is the need of assembly language?
Ans. ……………………………………………………………………………………………
…………………………………………………………………………………………………
……….…………………………………………………………………………………………
Q2. Convert this assembly language code into its mnemonic equivalent code.
START 200
27
READ FN
READ SN
MOVER AREG, FN
MOVER BREG, SN
ADD AREG, BREG
MOVEM RS, AREG
FN DC '0'
SN DC '0'
RS DC '0'
Ans.
Assembly language program Mnemonic code program
Label Instruction Operand MemoryLocation
Opcode Operand
Register Memory
3.4 MASM MASM is an acronym for Microsoft Macro Assembler whose first version wasdeveloped in 1981. This assembler was specifically designed for MS-DOS and MICROSOFTWINDOWS. First it was designed for 16-bit architecture. Now days, It has two supportedarchitectures i.e. 32-bit and 64-bit. Last version of MASM that was sold separately was 6.12,after that Microsoft included MASM into its C Compiler, VISUAL C++ and C also supportassembly language. As there have been many versions of assembly language are available in the
28
market as it is machine dependent, we should learn which particular version of assemblylanguage is used by MASM. Early versions of MASM used to support object models usingOMFs, which were actually used to generate binary equivalent for the given assembly languageprogram. From the day, when Microsoft packed MASM into its C compiler, it started usingPORTABLE EXECUTABLE FORMAT for model generation.
3.5 Using MASM in Visual C++ 2010
Follow these steps to start using visual C++ 2010 as an assembler.
1.) In the given templates for project types, click on the ‘Other Project Types’ and select‘Visual Studio Solutions’ and choose ‘Blank Solution’.
2.) In ‘Other languages option’ choose ;visual C++’ and select ‘empty project’ in ‘general’option.
29
3.) Right click on the Project in the Solution Explorer and select ‘Build Customizations’.
4.) Select ‘MASM’ and click on ‘OK’ button.
5.) Then give the name of the file. Its extension will be .asm. Then click on OK button.6.) Now, if you want you can give addition property values like start address etc. if you
want. If you don’t specify these values, then assembler will use default values.
3.6 Programs in assembly language
Now, we will make some programs in assembly language that will clear our idea how tomake various programs. Firstly, we will see two simple programs, then we will do somequestions.
Write a program to add two number in assembly language.
30
START 200
READ FN
READ SN
MOVE AREG, FN
MOVE BREG, SN
ADD AREG, BREG
MOVE RS, AREG
FN DC ‘5’
SN DC ‘4’
RS DC ‘0’
This is a simple program that will add two number that are stored in the memory labelled byFN and SN and the final result will be stored in memory location which is labelled by RS.
Write a program to print a number
START 200
READ N
PRINT N
N DC ‘5’
This program will print the constant value that is stored at the memory location labelled byN.
Check Your Progress/ Self-assessment Questions
Q3. What is MASM? Why do we need it?
Ans. ……………………………………………………………………………………………
…………………………………………………………………………………………………
……….…………………………………………………………………………………………
……….…………………………………………………………………………………………
……….…………………………………………………………………………………………
……….…………………………………………………………………………………………
31
……….…………………………………………………………………………………………
Q4. Write a program to multiply three numbers?Ans.……….…………………………………………………………………………………………
……….…………………………………………………………………………………………
……….…………………………………………………………………………………………
……….…………………………………………………………………………………………
……….…………………………………………………………………………………………
……….…………………………………………………………………………………………
……….…………………………………………………………………………………………
……….…………………………………………………………………………………………
……….…………………………………………………………………………………………
……….…………………………………………………………………………………………
3.7 Summary
In this chapter, we have studied about assembly language and assembler. We learnt to convertassembly language program in to its mnemonic code from where assembler converts ourprogram in to binary language. Then we learnt about MASM, and how we can use it using visualC++ 2010. Then we have seen few programs to understand how to write a programs in assemblylanguage.
3.8 Glossary
Assembly language: It is low level programming language that is machine dependent andrequires assembler for its conversion into binary and vice versa.
Assembler: It is a program that converts assembly language to machine language andvice versa.
MASM: It stand for Microsoft Macro Assembler that comes in visual C++ and allowsprogrammer to write and execute programs in assembly language.
3.8 Answers to self-assessment questions
32
Q1. Before assembly language, programmer use to write code in binary language, which wasvery difficult to understand and debug. So, assembly language was designed. It looks likeEnglish language so it is easy to understand. The only disadvantage of assembly language is thatit requires additional program to convert it into binary language as our processor can not directlyunderstand assembly language.
Q2.Assembly language program Mnemonic code program
Label Instruction
Operand MemoryLocation
Opcode Operand
Register Memory
START 200
READ FN 201 09 0 207
READ SN 202 09 0 208
MOVER AREG, FN 203 04 1 207
MOVER BREG, SN 204 04 2 208
ADD AREG, BREG 205 01 2
MOVEM RS, AREG 206 05 2 209
FN DC ‘0’ 207 00 0 207
SN DC ‘0’ 208 00 0 208
RS DC ‘0’ 209 00 0 209
END
Q3. MASM is an acronym for Microsoft Macro Assembler whose first version was developedin 1981. This assembler was specifically designed for MS-DOS and MICROSOFT WINDOWS.First it was designed for 16-bit architecture. Now days, It has two supported architectures i.e. 32-bit and 64-bit. We need MASM, when we want to write some program in assembly language,then our basic requirement is an assembler and as now days no assembler is separately availablein market, Microsoft provides combined assembler with visual C++ 2010. We can use it to writeand execute programs in assembly language..
Q4.
START 200
READ FN
33
READ SN
READ TN
MOVER AREG FN
MOVER BREG SN
MUL AREG BREG
MOVER BREG TN
MUL AREG BREG
MOVEM AREG, RS
FN DC ‘0;
SN DC ‘0’
TN DC ‘0’
RS DC ‘0’
END
3.9 REFERENCES
1. System Software, Charanjeet Singh, Kalyani publishers
2. System Programming, Donovan, 2nd Edition, Tata McGraw-Hill Education.
3. Systems Programming, D M Dhamdhere, Tata McGraw-Hill Education, 2011
3.10 MODEL QUESTIONS
1. Write a programme to calculate sum of five odd numbers in Assembly Language.
2. Write a programme to calculate multiplication of three integers in Assembly Language.
34
Chapter 4 Macro Instructions
Structure of the lesson
4.0 Objective
4.1 Introduction
4.2 Macros
4.3Macro Expansion
4.4 Features of Macro Facility
4.4.1 Macro instruction arguments
4.4.2 Conditional macro expansions
4.4.3 Macro calls within macros
4.5 Summary
4.6 Glossary
4.7 Answers to check your progress/self assessment questions
4.8 References/ Suggested Readings
4.9 Model Questions
4.0 Objective
After studying this lesson, student will be able to:
Explain macros. Describe features of macros Implement Arguments of macros Discuss Expansion of conditional macros. Use macros calls within macros
4.1 Introduction
In this chapter the concept of macros, macros instructions and expansion of macros willbe elaborated. Macro is an abbreviation used to define a sequence of operations. When in aprogram there is need to perform the repeated set of instructions again and again, then use ofmacros come into picture. We define a macro for that repeated set of instructions and use it in theprogram instead of repeating the same set of instructions. Also in situations when the chances ofchanging a particular set of operation is more, then it is easy to make changes in their definedmacro rather than making changes in the whole program. The changes made in the macro will be
35
automatically reflected back in the whole program. It also describes macro instruction argumentsand macro calls within macros.
4.2 Macro
While writing an assembly language program, a programmer has to repeat that blocks ofcode in the program that perform a particular task. Thus the programmer defines a singlemachine instruction to represent a block of code known as macro in the program. Once defined,the macro can be used in place of those repeated instructions performing a particular task. Thereare many definitions of macros. Some of them are given below:
1. Macro is a name or abbreviation for a part of program or for a sequence of operations.2. A set of codes for a particular operation can be defined as a macro. Defining any
subroutine like this is called macro definition. This is actually text replacing capability.3. A macro represents a commonly used group of statements in the source program.
A macro consists of a name, a set of arguments and a block of the code. It is just like afunction used to perform a particular task. The macro can be used anywhere in the program. Thecalling of Macro is called macro call .Macro call is also called as macro invocation.
There are very few differences between macro call and procedure call.
1. The macro call is made during the assembly process whereas the procedure call is madeduring the program execution.
2. In macro call the body is put into the object program. But in procedure call, the control istransferred to the program.
3. There is no need for any return statement in case of macros. But the procedure is expected toreturn something.
4. For every macro call, the body of the macro come into object program. In the procedure call,the body of the procedure appear only once in the object program.
Prototype for the macro
Each parameter begins with ‘&’. The following structure shows how to define a macro in theprogram.
Name MACRO & (parameters list)
:
Body
:
MEND
Name: is the name given to the macro.
36
MACRO: identify the beginning of a macro definition.
& Parameters list: defines the parameters that can be passed during macro call
Body: the set of statements that will be generated as the expansion of the macro.
MEND: identify the end of a macro definition.
The Keywords MACRO and MEND are the macro directives. The macros can have parametersas in subroutines. This expands the scope of the macro for various other situations. Theparameters can be formal and actual parameters as in procedures. The formal parameters arepresent in the macros definition and the actual parameters are present in the macro calls.
Let us consider an example, which shows the use of a pseudo-op named Define Constant (DC).
A 1, DATA Add contents of DATA to register 1A 2, DATA Add contents of DATA to register 2A 3, DATA Add contents of DATA to register 3
::
A 1, DATA Add contents of DATA to register 1A 2, DATA Add contents of DATA to register 2A 3, DATA Add contents of DATA to register 3
::DATA DC F ’2’ // is the value of the DATA to be used in the
opcode DC::
In this program, the following sequence occurs twice.
A 1, DATA Add contents of DATA to register 1A 2, DATA Add contents of DATA to register 2A 3, DATA Add contents of DATA to register 3
So in this case a macro can be used to perform this operation. Let’s define a macro named asADD. Following the prototype of the macro we can define it as follows:
MACROADD (is the name of the Macro)
A 1, DATA Add contents of DATA to register 1 (Body ofA 2, DATA Add contents of DATA to register 2 macro)A 3, DATA Add contents of DATA to register 3
37
MEND
Check Your Progress/ Self assessment Questions
Q1 What is the need of Macros.
………………………………………………………………………………………………………
…………………………………………………………………………………………………….
……………………………………………………………………………………………………..
Q2. What is the difference between the macro and Procedure call ?
………………………………………………………………………………………………………
…………………………………………………………………………………………………….
……………………………………………………………………………………………………..
Q3. Macro definition is also called macro call :
TRUE
FALSE
4.3 MACRO EXPANSION
The macro expansion is done by the interpreter or compiler automatically by replacing thepattern described in the macro. The macro expansion always happens at the compile-time incompiled languages. The tool that performs the macro expansion is known as macro expander.Once a macro is defined, the name of macro can be used rather than using the entire instructionset again and again. As you need not write the entire program repeatedly while expandingmacros, moreover the overhead associated with macros is very less. On encountering the abovedefined macro ADD, the compiler will replace it with the body of the macro (which defines theset of operations associated with that macro). You can notice the source and the correspondingexpanded source in the following code:
The macro processor replaces each macro call with the following lines:
A 1, DATA
A 2, DATA
A 3, DATA
38
The process of such a replacement is known as expanding the macro. The macro definition itselfdoes not appear in the expanded source code. This is because the macro processor saves thedefinition of the macro. In addition, the occurrence of the macro name in the source programrefers to a macro call. When the macro is called in the program, the sequence of instructionscorresponding to that macro name gets replaced in the expanded source.
Source | Expanded SourceMacro |ADD |A 1, DATA |A 2, DATA |A 3, DATA |MEND |. |. |. |ADD | A 1, DATA. | A 2, DATA. | A 3, DATA. | |
| .ADD | .. | .. | A 1, DATA. | A 2, DATA. | A 3, DATA. | .
: | .DATA DC F ‘2’ | DC DATA
4.4 Features of Macro Facility
Macro instruction arguments Conditional macro expansion Macro instructions defining macros
4.4.1 Macro Instruction Arguments
The macro calls are used to replace the repeated set of instructions in a program. But, this facilityis not as flexible as needed. Whenever a macro call is made, the coding that replaces it remainsthe same .It cannot be changed easily. There is one way to change it, by using the arguments orparameters in the macro calls. Consider the following program.
::
39
:A 1, DATA1A 2, DATA1 Block 1A 3, DATA1
:::
A 1, DATA2A 2, DATA2 Block 2A 3, DATA2
:::
DATA1 DC F ’5’DATA2 DC F ’10’
In this example, Block 1 and Block 2 of the instructions are performing the same operation buton the different data values. The first instruction performs an operation on an operand DATA1.In the second sequence the operation is being performed on operand DATA2. Now, we can usethe concept of Macro instruction arguments to handle such situations. The two differentoperands: DATA1 and DATA 2 can be passed as actual arguments in two separate calls to thesame macro to perform the same operation on them. This can be explained with the help offollowing example. The below given program contains dummy argument or parameter. Thisparameter is known as a macro instruction argument or dummy argument. The above writtenprogram can be rewritten as follows:
Source | Expanded SourceMacro |ADD &PAR |
A 1, &PAR | Dummy argumentA 2, &PAR |A 3, &PAR |MEND |
. |. |. |ADD DATA1 | A 1, DATA1 The operation. | A 2, DATA1 will be performed. | A 3, DATA1 on DATA1. |
| .
ADD DATA2 A 1, DATA2 The operation will. | A 2, DATA2 be performed on. | A 3, DATA2 DATA2. | .
40
: | .
DATA1 DC F ‘2’ | DC DATADATA2 DC F ‘7’ | DC DATA
In the above program, a dummy argument is specified by inserting an ampersand (&) symbolbefore it. Any number of arguments can be passed in the macros depending on the need of theprogrammer. The important thing to understand about the macro instruction argument is thateach argument must correspond to a definition or dummy argument on the macro name line ofthe macro definition. The supplied arguments are substituted for the respective dummyarguments in the macro definition whenever a macro call is processed.
4.4.2 Conditional Macro Expansion
The concept of conditional statements is very common in programming. All the programs aresequential in nature but in some situations we may need to change the flow of execution ofinstructions based on some conditions. This is implemented with the help of conditionalstatements. The conditional macro expansion is also used for the same purpose. There are twoimportant macro processor pseudo-operations namely AIF and AGO
a) AIF: is a conditional branch pseudo operation; the condition is tested, if it evaluates to betrue the program branches to that particular set of instructions.
b) AGO: is an unconditional branch pseudo-operation and behaves like a go to statement. Ittransfers the control of the program to the macro instruction containing the label specifiedafter go to statement.These statements are directive to the macro processor and do not appear in the macroexpansion.
The concept of conditional macro expansion can be easily explained with the help of belowgiven example.
BLOCK1 A 1, DATA1A 2, DATA2A 3, DATA 3
BLOCK2 A 1, DATA 3A 2, DATA2
BLOCK3 A 1, DATA1A 2, DATA3A 3, DATA2A 4, DATA4
41
DATA1 DC F’2’DATA2 DC F’7’DATA3 DC F’11’DATA1 DC F’15’
In the above code, the number of instructions, the data operands and the labels are different foreach loop. This program could be written as follows:-
MACRO&PAR0 LOOP &NUMBER, &PAR1, &PAR2, &PAR3&PAR0 A 2, &PAR2
AIF (&NUMBER EQ 2) .FINISH Conditional Macro that testA 3, &PAR3 value of NUMBER and if itAIF (& NUMBER EQ 3) .FINISH evaluates to be trueA 4, &PAR4 transfers the control to the
.FINISH MEND end of macro
Source | Expanded Source|
BLOCK 1 LOOP 3, DATA1, DATA2, DATA3 BLOCK 1 A 1 DATA1| A 2 DATA2
A 3 DATA3|
BLOCK 3 LOOP 4, DATA1, DATA3, DATA2, DATA4 BLOCK 3 A 1 DATA1A 2 DATA3
| A 3 DATA2A 4 DATA4
|It can be seen in the above example if the value of NUMBER comes out to be 2, and then thetwo parameters are passed to the macro. Its value is checked with the help of AIF statement. Thesame sequence of parameters is used in the macro expansion as passed in the macro call.The labels starting with the (.) like .FINISH is a macro label that transfers the control to the
statement where .FINISH is written.Thus the AIF and AGO control the sequence in which the macro processor expands the sequenceof instructions in macro expansion.
4.4.3 Macro calls within Macros
One macro can be used within the definition of another macro. Thus one macro can be calledfrom other macro. Macro calls made within other macros usually consists of various levels. Thevarious conditional macro operations like AGO and AIF can also be used to design the macroaccording to the programmer needs.
42
MACRO
MUL &PARL 1, &PAR i.e load the passed argument in the Register 1M 1, =F’4’ Multiply the contents of register with constant 4ST 1, &PAR Store the value of Register 1 in Passed variable
MEND
MACRO (A new macro calling already defined macro MUL)
MUL1 &PAR1,&PAR2MUL &PAR1MUL &PAR2
MEND
In the above example macro MUL1 calls the MUL with different parameters. This macroexpansion takes at various levels as explained below:-
Source Expanded Source Expanded Source(Level 1) (Level 2)
MUL1 DATA1, DATA2 MUL DATA1 L 1, DATA1M 1, =F’4’ST 1, DATA1
MUL DATA2 L 1, DATA2M 1, =F’4’ST 1, DATA2
DATA1 DC F’5’ DATA1 DC F’5’DATA2 DC F’3’ DATA2 DC F’3’
Al level 1, MUL1 is expanded which further made call to the two MUL macros. These two MULmacros are further expanded in the next higher level with the respective called parameters.The macro calls when used with the conditional macros can provide ample scope ofprogramming to the programmers and increases the reusability and flexibility of the code.
43
Check Your Progress/ Self assessment Questions
Q4. What is macro expander?
………………………………………………………………………………………………………
…………………………………………………………………………………………………….
……………………………………………………………………………………………………..
Q5. What is a macro label? Give its example.
………………………………………………………………………………………………………
…………………………………………………………………………………………………….
……………………………………………………………………………………………………..
Q6.True or False:
a) Macros cannot be nested:................
b) Macro calling involves name of the macro and the arguments to be passed :............
c) AGO is a conditional branch statement. :...................
d) MACRO and MEND are macro directives......................
Q7.The macro processor must perform:
a) Recognize macro definition and macro call
b) Save the macro definitions
c) Expand macro calls and substitute arguments
D) all of these
4.5 Summary Macro is a name or abbreviation for a part of program or for a sequence of operations.
A macro consists of a name, a set of arguments and a block of the code.
44
The macro can be used anywhere in the program. The calling of Macro is called macrocall.
On encountering the defined macro, the compiler will replace it with the body of themacro (which defines the set of operations associated with that macro).The process ofsuch a replacement is known as expanding the macro.
The flexibility of macros can be enhanced by passing any number of arguments to it. Thisis known as Macro Instructions arguments.
Just like programming, the flow of execution of instructions can be changed based onsome conditions as per requirement. This is implemented with the help of conditionalstatements.
There are two important macro processor conditional pseudo-operation namely AIF andAGO.
One macro can be used within the definition of another macro. This is called macro callswithin macros.
Macro calls made within other macros usually consists of various levels.
4.6 Glossary
Macro: - Macro is a name or abbreviation for a part of program or for a sequence ofoperations.
Macro call: - The calling of Macro is called macro call .Macro call is also called as macroinvocation.
Macro Expansion:- The macro expansion is done by the interpreter or compilerautomatically by replacing the pattern described in the macro.
Macro Conditional Expansion:-it is used to change the flow of instructions in a programfrom sequential to some conditional.
Macro label; - The label starting with the (.) and that transfer the control where label isgiven.
4.7 Answers to Self Assessment questions
1. While writing an assembly language program, a programmer has to repeat that blocks of codein the program that perform a particular task. Thus the programmer defines a single machineinstruction to represent a block of code known as macro in the program. Once defined, the macrocan be used in place of those repeated instructions performing a particular task.
2. The macro call is made during the assembly process whereas the procedure call is madeduring the program execution. In macro call the body is put into the object program but inprocedure call, the control is transferred to the program. There is no need for any returnstatement in case of macros. But the procedure is expected to return something. For every
45
macro call, the body of the macro come into object program. In the procedure call, the body ofthe procedure appear only once in the object program.
3. FALSE
4. The macro expansion is done by the interpreter or compiler automatically by replacing thepattern described in the macro. The macro expansion always happens at the compile-time incompiled languages. The tool that performs the macro expansion is known as macro expander.
5. The labels starting with the ( . ) like .END is a macro label that transfers the control to thestatement where .END is written in the program.
6. a) FALSE
b) TRUE
c) FALSE
d) TRUE
7 d) all of these
4.8 References
1. System Programming , Donovan, 2nd Edition, Tata McGraw-Hill Education.
2. Systems Programming, D M Dhamdhere, Tata McGraw-Hill Education, 2011
4.9 Model Questions
1. What do you mean by Macros? Why they are needed in any programming language?2. Discuss the prototype of a macro?3. What is the need of conditional expansion of macros?4. What do you mean by AGO and AIF?5. How macros expansion takes place in a program?6. How the macro calls within macros can be implemented? Discuss in detail.
46
Lesson 5: Loaders
5.0 Objectives
5.1 Introduction
5.2 Functions of loader
5.3 Compile and go loader
5.4 General Loader Scheme
5.5 Absolute Loaders
5.6 Summary
5.7 Glossary
5.8 Answers to check your progress
5.9 References/Suggested Readings
5.10 Model Questions
5.0 Objectives:-
After studying this lesson, student will able to:-
List the functions of loaders. Discuss the concept of compile and go loader. Explain the general loader scheme. Define the concept of absolute loaders.
5.1 Introduction
A loader is a system program and is a part of operating system that performs the loadingfunction. It allocates memory and brings object program into memory and starts its execution.
The execution period of user program is called execution time and the translating period is calledassembly or compiles time.
47
Figure 5.1 Role of loader
Source Program – Assembly Language
Object Program - From assembler
Contains translated instructions and data values from the source program
Executable Code - From Linker
Loader - Loads the executable code to the specified memory locations and code
gets executed.
5.2 Functions of loaders
The loader performs following functions:
Allocation - The loader examines and allocates the memory space for the execution ofprogram.
Linking – It combines two or more different objects and provides needful information.
Relocation - The loader maps and relocates the address references to newly allocated memoryspace during execution.
Loading - The loader brings object program into memory.
5.3 Compile and go loader
It is a link editor in which assembler itself places the assembled instructions directly intothe designated memory locations for execution.
The instruction is read line - by – line and after that it is assigned starting address afterthe completion of assembly process.
E.G. WATFOR FORTRAN compiler. This loading scheme is also called as assemble-and-go or load-and-go system. Thus in such type of loader, assembling or compiling, linking and loading goes in one
step. As a result it does not require extra procedures.
48
Figure 5.2 Compile with go loader
Advantages
1. It is easy to implement.2. It is simple and efficient solution. It does not involve extra procedures.
Disadvantages
1. A portion of memory is wasted because the core is occupied by the assembler.2. It is necessary to retranslate or assembler the users program every time it is run.3. It is very difficult to handle multiple modules.4. To execute assembly program, it has to be assemble again and again.5. The codes of the program have to be in same language.
Self assessment questions 1
1. List various functions of loader.
________________________________________________________________________
________________________________________________________________________
2. _________________ is a link editor in which assembler itself places the assembledinstructions directly into the designated memory locations for execution.
3. It is easy to handle multiple modules in compiler and go loader. ( TRUE/ FALSE ).
______________________________________________________________________
5.4 General Loader Scheme
49
General loader produces the translated form of source code. This output containing codedform of instructions is called object program.
These object programs are not directly placed into the core. Rather these instructions anddata are saved elsewhere and can be loaded into the core whenever the code is to beexecuted.
In this, source program is translated into object program using assembler. Then it is loaded into main memory along with source code of loader. Size of loader is smaller than assembler code therefore; more space is available for the
object program. The use of an object program as intermediate data requires the addition of new program
to the system. This is called loader. The loader accepts the object program and places into core in an executable form.
Figure 5.3 General loader
Advantages1. It is not required to reassemble the program in order to run the program at later stage.2. Loader is assumed to be smaller than the assembler. As a result more memory is
available to the user.3. The assembler does not reside in memory at all times. Thus core is not wasted.
Disadvantages
1. We have to store source code of loader.
5.5 Absolute Loaders
50
The object code is loaded to particular locations in the memory. After that the loaderjumps to the specified address to begin execution of the loaded program.
The loader reads the file and places the code at the absolute address given us the file. No relocation is needed to be stored as part of an object file. Resolution of external references and linking of different modules which are
interdependent is done by the programmer assuming programmer knows memorymanagement.
In this scheme multiple segments are allowed.
Absolute loader ,the loader perform the four functions:o Allocation by programmero Linking by programmero Relocation by assemblero Loading by loader
For this the assembler must give the following information through object files:o Starting address and name of each module.o Length of each module.
Figure 5.4 Absolute loader
It requires two cards:
1. Text card2. Transfer card
Text card: It contains information about what is to be loaded.
Card Type: It indicate which type of card
0 for Text card.
1 for Transfer card.
Count: It indicates amount of information which is to be loaded.
Address: It indicates location at which information is to be loaded.
Content: It indicates binary information which is to be loaded.
51
Transfer Card: It is used to indicate where to load the program
Card type is 1.
Count is always 0.
Address: It indicate location from where execution of object programshould begin.
Content: It always kept blank.
Self assessment questions 2
4. ___________ card contains information about what is to be loaded.
5. Which card specifies the location at which the program is to be loaded?______________________________________________________________________
6. General loader does not require reassembling of program in order to run the programat later stage. ( TRUE/ FALSE ).
______________________________________________________________________
Algorithm for an absolute loader
Beginread Header recordreadHeaderrecordverify program name and lengthread first Text recordwhile record type is not ‘E’ dobegin{if object code is in character form convert it into international representation}Move object code to specified location in memoryRead next object program recordEndJump to address specified in End recordEnd
5.6 Summary
52
Loader is a system program that brings an executable file stored on disk into memory and startsits execution. It reads the executable file to determine the size and then create a new addressspace for the program. After that it copies instructions arguments and data into address space. Itinitializes the machine registers and jump to a start routine. A number of schemes are availabe toimplement the concept of loader.
5.7 Glossary
Loader- System program that loads the program into memory.Operating system- System program responsible for the overall resourcemanagement of the system.Assembler- It is used to perform the conversion of assembly code into machinecode.
5.8 Answers to check your progress
1.
Allocation - The loader examines and allocates the memory space for the execution ofprogram.
Linking – It combines two or more different objects and provides needful information.
Relocation - The loader maps and relocates the address references to correspond to the newlyallocated memory space during execution.
Loading - The loader brings object program into memory.
2. Compiler and go loader
3. FALSE.
4. Text.
5. Transfer card.
6. TRUE.
5.9 References/Suggested Readings
1. Systems Programming by John J Donovan, Tata McGraw-Hill
2. Systems Programming by Dm Dhamdhere, Tata McGraw-Hill Education
3. Systems Programming Charanjeet Singh Kalyani Publications
53
5.10 Model Questions
1. List the advantages of using absolute loader.
2. List various functions of a loader.
3. Explain the concept of compile and go loader in detail.
4. Explain general loader scheme.
54
Lesson 6: Linkers
6.0 Objectives
6.1 Introduction
6.2 Subroutine Linkages
6.3 Relocating Loaders
6.4 Direct Linking Loaders
6.5 Relocation
6.6 Design of Absolute Loader
6.7 Boot strap Loaders
6.8 Dynamic Linking
6.9 MS-DOS Linker
6.10 Summary
6.11 Glossary
6.12 Answers to check your progress
6.13 References/Suggested Readings
6.14 Model Questions
6.0 Objectives:-
After studying this lesson, student will able to:- Define the notion of linking. List different types of linkers. Explain the concept of relocation. Discuss the benefits of dynamic linking
6.1 Introduction
A program is generally divided into number of modules and these modules may or may not bepart of a single object file. Modules refer to each other by means of symbols. An object filecontains defined "external" symbols, undefined "external" symbols, and local symbols.
Each source code file after compilation results in one object file. Role of the linker is to combinemultiple object files into a single executable program. It does so by resolving the symbols.Library is a collection from where linker can take objects. The linker is also responsible toarrange the objects in a program's address space which may include relocating code. Compiler
55
assumes a fixed base location of base address. (for example, zero). Re-targeting of absolutejumps, loads and stores is usually involved in relocating the machine code.
Fig:6.1 Linker
6.2 Subroutine Linkages
Suppose that a program MAINR wishes to jump to some subprogram SUB. The code inprogram MAINR must contain instruction "BSR SUB". It means branch to sub routine SUB. Butthe assembler is unaware of this symbol reference. The assembler will then generate an error.solution to this problem is called subroutine linkage. The problem occurred because thesubroutine SUB is not written inside the program segment of MAINR.
The only program with the instruction BSR SUB" is that the assembler is unaware of segmentSUB and it is not able to search the value of this symbolic reference. The assembler directiveEXT is used to declare such subroutine as external and thus it should be added at the beginningof the segment MAINR. It is used to inform the assembler about the sub routine which is definedin some other segment. Variables in one segment that can be referred by other segments bedeclared using pseudo-ops INT. this concept is known as subroutine linkage.
For example
MAIN START EXT SUB..
55
assumes a fixed base location of base address. (for example, zero). Re-targeting of absolutejumps, loads and stores is usually involved in relocating the machine code.
Fig:6.1 Linker
6.2 Subroutine Linkages
Suppose that a program MAINR wishes to jump to some subprogram SUB. The code inprogram MAINR must contain instruction "BSR SUB". It means branch to sub routine SUB. Butthe assembler is unaware of this symbol reference. The assembler will then generate an error.solution to this problem is called subroutine linkage. The problem occurred because thesubroutine SUB is not written inside the program segment of MAINR.
The only program with the instruction BSR SUB" is that the assembler is unaware of segmentSUB and it is not able to search the value of this symbolic reference. The assembler directiveEXT is used to declare such subroutine as external and thus it should be added at the beginningof the segment MAINR. It is used to inform the assembler about the sub routine which is definedin some other segment. Variables in one segment that can be referred by other segments bedeclared using pseudo-ops INT. this concept is known as subroutine linkage.
For example
MAIN START EXT SUB..
55
assumes a fixed base location of base address. (for example, zero). Re-targeting of absolutejumps, loads and stores is usually involved in relocating the machine code.
Fig:6.1 Linker
6.2 Subroutine Linkages
Suppose that a program MAINR wishes to jump to some subprogram SUB. The code inprogram MAINR must contain instruction "BSR SUB". It means branch to sub routine SUB. Butthe assembler is unaware of this symbol reference. The assembler will then generate an error.solution to this problem is called subroutine linkage. The problem occurred because thesubroutine SUB is not written inside the program segment of MAINR.
The only program with the instruction BSR SUB" is that the assembler is unaware of segmentSUB and it is not able to search the value of this symbolic reference. The assembler directiveEXT is used to declare such subroutine as external and thus it should be added at the beginningof the segment MAINR. It is used to inform the assembler about the sub routine which is definedin some other segment. Variables in one segment that can be referred by other segments bedeclared using pseudo-ops INT. this concept is known as subroutine linkage.
For example
MAIN START EXT SUB..
56
.CALL SUB..END SUB START..RETEND
The subroutine SUB is declared as external at the beginning of MAIN. When a call to subroutineB is made, before making the unconditional jump, The current content of the program countershould be stored in the system stack which is maintained internally. To restore the programcounter of caller routine with the address of next instruction to be executed, the pop is performedwhile the subroutine B(at RET) is returned.
6.3 Relocating Loaders
Relocating loaders are needed by some operating systems, which adjust addresses (pointers) inthe executable to counterbalance for variations in the address at which loading initiates.Relocation loaders are needed for programs that are not always loaded at the same memorylocation. It helps in improving the memory utilization.
6.4 Direct Linking Loaders
The most common kind of loader is the direct linking loader which is a re -locatable loader.Source code cannot be accessed directly by the loader. Two methods can be used to load theobject code into memory. One method is to use the relative addressing and the other method is touse absolute addressing for loading the object code into the memory. In case relative addressing,it is the responsibility of the assembler to provide information related to relative addresses to theloader.
The lists of undefined symbols in the current segment but can be used in the current segment arestored in a data structure called USE table. The USE table includes the information such as nameof the symbol, address, and address relativity.
The lists of symbols which are defined in the current segment and can be referred by the othersegments are stored in a data structure called DEFINITION table. The definition table includesthe information such as symbol, address.
The assembler evolve following types of cards:1.ESD -External symbol dictionary comprises information about all symbols that are defined in thisprogram but referenced somewhere. It contains:
1 Symbol Name2 TYPE
57
3 Relative Location4 Length5 Reference no
2. TXT - Text card comprise of actual object cards
3. RLD -Relocation and linkage directory comprises information about address dependent instructions ofa program. The RLD cards contain the following information:
1 Location of the constant that needs relocation2 By what it has to be changed3 The operation to be performed4 The Format of RLD5 Reference No6 Symbol7 Flag8 Length
9 Relative Location
4. END – It signifies the end of the program and specifies starting address of for execution.
Advantages: The main task of loader is to load a source program into memory and prepares it forfurther execution. In pass - I direct link loader allocates segments and define symbols for lexicalanalysis. Each symbol in the phase assigned to next available location after proceeding. Segmentin order to minimize the amount of storage required for the total program.
Disadvantages
• It is necessary to allocate, relocate, link, and load all of the subroutines each time in orderto execute a program
– loading process can be extremely time consuming.
• Though smaller than the assembler, the loader absorbs a considerable amount of space
– Dividing the loading process into two separate programs a binder and a moduleloader can solve these problems.
6.5 Relocation
Relocation refers to the adjustment of code and data in the program to reflect the assigned loadaddresses. Relocation is achieved by linked in association with symbol resolution. Symbolresolution refers to the process of searching files and libraries to replace symbolic references orwith actual usable addresses in memory before executing a program. Relocation can be doneboth at link time and run time. Relocation at run time can be achieved using relocating loader.
58
Object code generated by the assembler is executed after it is loaded into a specified location inthe memory. The addresses of such object code will get specified only after the assembly processis over. Therefore, after loading,
Address of object code = Mere address of object code + relocation constant.
Both absolute and relative addresses can be used to map the object code in memory. Directmapping of the object code in main memory can be achieved using absolute address. Mapping ofthe object code in main memory can also be achieved by adding the value of relocation registerto the relative address. This is called relocation. It can be achieved as follows:
1. The linker merges all sections ( from all object files) of similar type into a single section ofthat type. It is done to generate a single executable file. The linker then assigns run timeaddresses to each section and each symbol.
2. Each section refers to one or more symbols which should be modified so that they point to thecorrect run time addresses based on information stored in a relocation table in the object file.
Relocation table refers to a list of pointers created by the assembler. These pointers are availablein the object file. Each entry in the table points to an address in the object code that must bechanged when the loader relocates the program.
Self assessment questions 1
1. An object file consists of following symbols..______________________________________________________________________
______________________________________________________________________
2. BSR instruction stands for?
______________________________________________________________________
______________________________________________________________________
3. _______________ refers to the adjustment of code and data in the program to reflect theassigned load addresses.4. Relocation table refers to a list of pointers created by the assembler. ( TRUE / FALSE )______________________________________________________________________
6.6 Design of Absolute Loader
Absolute loader creates re-locatable object files. These files are then loaded into the specifiedlocations in the memory. It is an example of absolute loader. The information related to the
59
relocation is included within the object files. The programmer must have in-depth knowledge ofthe memory management to implement the concept of absolute loader. Programmer must becapable of handling the resolution of external references or linking of different subroutines. Theprogrammer should take care of two things: first thing is : specification of starting address ofeach module to be used. In case any changes are made to any of the modules, programmer mustmake necessary changes in the starting addresses of other modules. In case of branching fromone segment to another, the absolute starting address of respective module is to be known by theprogrammer so that such address can be specified at respective JMP instruction.
Fig:6.2 Absolute loader
6.7 Bootstrap Loaders
Instructions stored in the ROM of the computer system are generally executed as and when a computer
system is switched on. There instructions are used to examine the system hardware. The process of
examining whether all system hardware is functioning properly or not is called POST or power on self
test. It checks the CPU, memory, and basic input-output systems (BIOS) for errors and stores the result in
a special memory location. Once the POST is complete, BIOS begins to activate the computer's disk
drives. In most modern computers, when the computer activates the hard disk drive, it finds the first piece
of the operating system: the bootstrap loader.
The primary function of the bootstrap loader program is to load the operating system into memory and
allows it to begin operation. It reads the hard drives boot sector to start with the process of loading
60
the computers operating system. Bootstrap loader sets up small driver programs that interface with and
control the various hardware subsystems of the computer. It sets up the memory partitions that hold the
operating system, user information and applications. It also establishes the data structures to hold the
signals, flags and semaphores that are used to communicate within and between the subsystems and
applications of the computer. Last function the bootstrap loader program is to hand over the control of the
system to the operating system.
Alternatively referred to as bootstrapping, boot loader, or boot program, a bootstrap loader is aprogram that resides in the computer's EPROM, ROM, or other non-volatile memory.
6.8 Dynamic Linking
The last step of compiling is called linking. It is performed by a linker or link editor. A program
makes use of a number of libraries files other than the source code written by the programmer.
The linker inserts code to map the shared libraries to resolve the problem of program library
references. Under static linking, executable file of each program must include the copy of the
library file. It is fast and portable as it does not require the library files linked to the executable
file to be present on the system where it is run. But, it results in memory wastage. It is possible to
share these library files among various programs. The library files are loaded separately in the
memory. Dynamic linking then links these system libraries to the program just before or during
the program execution. Multiple programs can be linked to the same library file without having
to embed the same into its executable file. Only the references to these sharable library files is
specified in the executable image. The linking takes place only when the file is executed. Also
only a single copy of that shared library file is loaded into the memory, even if it is linked to
multiple executable files.
Self assessment questions 2
5. Absolute loader creates ________________ object files.
6. POST stands for?
______________________________________________________________________
______________________________________________________________________
7. What is the advantage of static loading?______________________________________________________________________
61
6.9 MS DOS Linker
It refers to a linkage editor. It is used to combine multiple object modules to produce anexecutable program. The object module have the filename with extension . OBJ and includes abinary image of the translated instructions and program’s data. Therefore, it combines variousOBJ files and produces a file with extension EXE
Object Module of MS-DOS contains several different object record types.
The THEADR record specifies the name of the object module. It is typicallyoriginated by the translator from the source file name.
The PUBDEF record comprises a list of external symbols or public namesstated in the segment of object module.
The EXTDEF record involves a list of external references used in this objectmodule
Both PUBDEF and EXTDEF can contain information about the data typedesignated by an external name and these types are defined in the TYPEDEFrecord.
SEGDEF record describes the segments in object module which also haveinformation regarding the segment’s name, its length, alignmentrequirement of its base address (e.f. word or paragraph i.e. (6 byte alignment)and whether thesegment is relocatable or obsolute The LNAMES record includes a list of all the segment and class name used
in the program LEDATA records includes the translated instructions and data from the
source program i.e. the binary image of the code and data produced bytranslator.
LIDATA records specify translated instructions and data that occurrecursively in program.
FIXUPP records contains information to resolve external references and toperform address modification that are associated with relocation of segmentswithin the program.
MODEND record denotes the end of module and can contain reference to thestarting point of the program.
62
6.10 Summary
Role of the linker is to combine multiple object files into a single executable program. The linker is also
responsible to arrange the objects in a program's address space which may include relocating code.
Relocation loaders are needed for programs that are not always loaded at the same memory
location. It helps in improving the memory utilization. Relocation refers to the adjustment of code
and data in the program to reflect the assigned load addresses. Absolute loader creates re-locatable
object files. These files are then loaded into the specified locations in the memory. The primary
function of the bootstrap loader program is to load the operating system into memory and allows it to
begin operation. It is possible to share these library files among various programs. The library files are
loaded separately in the memory. Dynamic linking then links these system libraries to the program just
before or during the program execution. MS DOS linker is used to combine multiple object modules
to produce an executable program.
6.11 Glossary
Loader- the component used to load a program into memory.Linker- linker is used to combine multiple object files and library files into a single executableprogram.Program counter- Special register that points to the location of the next instructionto be executed.Pointer- A variable used to store the location of a reference or a function.Relocation- It refers to the adjustment of code and data in the program to reflect the assigned loadaddresses.
6.12 Answers to check your progress
1. An object file contains defined "external" symbols, undefined "external" symbols, and localsymbols.
2. Branch to sub routine.
3. Relocation
4. TRUE.
5. re-locatable
6. POWER ON SELF TEST.
63
7. Static linking is fast and portable as it does not require the library files linked to the executablefile to be present on the system where it is run.
6.13 References/Suggested Readings
1. Systems Programming by John J Donovan, Tata McGraw-Hill
2. Systems Programming by D M Dhamdhere, Tata McGraw-Hill Education
3. Systems Programming Charanjeet Singh Kalyani Publications
6.14 Model Questions
1. What is the need of subroutine linkage?
2. Explain the role of bootstrap loader.
3. What is the advantage of dynamic linking?
4. Explain absolute loader.
5. What do you mean by a linker?
64
Lesson 7: Editors
7.0 Objective
7.1 Introduction
7.2 Types of text editor
7.3 Design of editor
7.4 Line Editor
7.5 Stream Editor
7.6 Screen Editor
7.7 Word Processor
7.8 Structure Editor
4.9 Summary
7.10 Glossary
7.11 Answers to check your progress
7.12 References/Suggested Readings
7.13 Model Questions
7.0 Objectives
After studying this lesson, student will able to:-
Define the concept of editors. Discuss the design of editors. Explain different types of editors available.
7.1 Introduction
Editors are used to draft or modify codes in an operating system. Different types of editors areavailable from the programmer to use. Editor is an interactive tool that allows the programmersto alter the text on the run. Earlier only the simple text editors were available. Latest editorsallow you to create documents with different formats and provide advanced features to insertvarious objects in it.
7.2 Types of Text Editors
Following are the various types of editors based on how editing is performed and outputgenerated by it.
65
Line Editors – End of line marker is used to delimit lines or identify the end of one line duringoriginal creation and during successive revision , the line number explicitly specifies the line.
Stream Editors –Although an idea is similar to the line editor but the whole texts evaluated asstream of characters . So the line number does not specify the location for revision. Locations forrevision are either specified by using pattern contexts or through explicit positioning. Text-onlydocuments can be best created using line and stream editors. Example: sed in Unix/Linux.
Screen Editors - It is used to view a document as a two dimensional plane. You can also work onthe document using the same two dimensional plane. Content for revision purpose can bespecified within the displayed portion at any place. Example: vi, maces etc.
Word Processors – It is an advanced type of editor. Besides all the basic functionalities of lineand stream editor, it also provides support for display of various objects like images, graphicsand provide a number of choices of fonts, style, etc.
Structure Editors – Structure editor is used with specific type of documents. It is mainly used tokeep the structure/syntax of the document intact.
7.3 Design of Editors
The major functions in editing are travelling, editing, viewing and display. Travelling refers tothe movement of editing context to a new position inside the text. It may be implied in a usercommand or can also be explicitly specified by the user. Viewing means formatting a text in amanner as required by the user. This is an abstract view, independent of the physical aspects ofan IO device. Display function is used to map this abstract view into the physical characteristicsof the display device such as monitor, printer, etc. It denotes where a particular view may appearon the user’s screen. Separating both the functions, i.e. viewing and displaying helps indesigning multiple windows on the same screen, parallel edit operations using the same displayterminal, etc. most of the editors tend to combine the two functions.
66
Figure 7.1 Editor structure
The figure above illustrates the schematic structure of a simple editor. For a given position of theediting context, the editing and viewing filters work on the internal form of text to prepare theform suitable for viewing and editing. These forms are then put in the editing and viewing bufferrespectively. It is the responsibility of the viewing and display manager to make provisions forappropriate display of text. When the cursor position changes, the filters operate on a newportion of text to update the contents of the buffers. Once the editing has been performed, theediting filter reflects changes into the internal form and updates the contents of the viewingbuffer.
7.4 Line Editor
A line editor is a text editor. Each editing command in a line editor is applied to one or morelines of text designated by the user. Line editors are very old and obsolete. They were mainlyused for interaction with teleprompter. Teleprompter refers to an approach of connecting theprinter directly with the keyboard, with no video display. No mechanism was provided tonavigate interactively in a document.
Line editors are limited to typewriter keyboard text-oriented input and output methods. Editingusing line editors happened line-at-a-time. Typing, editing, and document display do not occursimultaneously. Typically, typing does not enter text directly into the document. Instead, usersmodify the document text by entering terse commands on a text-only terminal. Commands andtext, and corresponding output from the editor, will scroll up from the bottom of the screen in theorder that they are entered or printed to the screen. Although the commands typically indicate the
67
line(s) they modify, displaying the edited text within the context of larger portions of thedocument requires a separate command. The reference of “current line” is kept by the line editorsto which the entered commands are applied .On the contrary , modern screen based editors letsthe user to interactively and directly move , select, and modify portions of the document.Normally, line numbers or a search based context (especially when making alterations withinlines) are used to specify which part of the document is to be edited or displayed.
Self assessment questions 1
1. _____________________ is used to view a document as a two dimensional plane.
2. List various functions of editor.
______________________________________________________________________
______________________________________________________________________
3. ________________ refers to an approach of connecting the printer directly with thekeyboard, with no video display.
7.5 Stream Editor
A stream editor is used to view the entire text as a sequence of characters. It means that the textis not delimited by end of line characters and the edit operations can be performed beyond theboundary of a line. Stream editors support character, line, and context oriented commands. Astream editor performs the text transformations on an input stream. That input stream may be afile or input from a pipeline. The pointer can be employed using positioning or searchcommands. There is difference in way the text is displayed and it appears on the paper if printed.Only difference between the line editor and the stream editor is that the latter views the completetext as single stream of characters. Locations for revision are either specified by explicitpositioning or by using pattern context. eg. sed in Unix/Linux.
sed
The sed allows only one pass over the input(s), and is consequently more efficient. Another keyfeature of sed is that it can filter text in a pipeline. This feature separates the sed from othertypes of editors.
In UNIX operating system, sed can be invoked as follows:
sed OPTIONS... [SCRIPT] [INPUTFILE...]
It is not mandatory to specify the input file. In case you wish to filter the text inside thestandard input, you can leave the parameter INPUTFILE as empty or specify - for
68
it. sed considers the script and not an input file if none of the other options specifiesa script to be executed.
Working of sed
Two data buffers are maintained by sed. The two are named activepattern space and auxiliary hold space. Both the buffers are initially empty. For every line ofinput from the input stream, sed performs the following cycle on it:
It reads a single line from the input stream and places the same in the pattern space. The newlineat the end of the line is removed. Commands are then executed for that line. An address thatrefers to a condition code is linked to each command. Command is executed only after theverification of condition code. Once the pointer reached the end of the script, the contents ofpattern space are printed out to the output stream. The cycle is repeated for each line.
'D' command is used to hold the pattern space between the two cycles, otherwise the patternspace is deleted between two cycles. It is beyond the scope of this lesson to discuss all thecommands using the sed editor.
7.6 Screen Editor
1. A primitive form of editors that allows you to edit the text on the (display) screen bymoving the cursor to the specific location.
2. Ex : visual Basic Studio , html editors , etc3. A screen editor is based on the principle of WYSIWYG i.e. What You See Is What You
Get.4. It displays a screen full of text at a time.5. As an output, in screen editors it is possible to see the effect of an edit operation on the
screen.6. In such editors the document is printed in the same form as is exactly displayed on the
screen.7. It allows us to see and access many lines at a time.
Examples
1. Pico2. GNU Emacs3. Xedit4. Vi
69
7.7 Word Processor
A word processor is an advanced form of an editor. It is also popularly categorized as computersoftware application. It is capable of performing editing, formatting, insertion of non-textualobjects and also printing of documents.
Traditionally the word processor was used as a keyboard text-entry and printing functions of anelectric typewriter, Tape or floppy disk was used to record the text. Depending on company tocompany, word processors supported monochrome display and the ability to save documents onmemory cards or diskettes. The word processors in modern day are very advanced. They providesupport like spell-checking programs, organizing of text in table forms, applying various fontstyles on text, sorting, advanced searching. Text formatting options like bold, italics, underline,font, style. Users is allowed to move the section of text from one place to another merge text andsearch and replace the words. Example: WordStar and MS Word
Self assessment questions 2
4. sed is an example of _________________ editor.
5. Two buffers maintained by sed are ___________ and _____________ .
______________________________________________________________________
______________________________________________________________________
6. A primitive form of editors that allows you to edit the text on the (display) screen bymoving the cursor to the specific location is called screen editor. ( TRUE / FALSE ).______________________________________________________
7.8 Structure Editor
Structure editor is used with different programming languages such as C++, or HTML. It helpsto put the document or the code in proper structural format by automatically inserting variousdelimit symbols at the end of a block. It also provides automatic indentation at the beginning andend of each block. It aids in maintaining particular structure/syntax used by a specified language.Some advanced structure editors also provide hints in term of name of the method that mayfollow a given object. Structure editors are generally embedded within software developmenttools like eclipse or .NET.
Text editing can be combined with structure editing in user interface to form a single hybridediting tool. Emacs is an example of one such hybrid editing tool. It is a text editor that alsoprovides support like the manipulation of words, sentences, and paragraphs as structures that areinferred from the text. Page Maker and Dreamweaver are two more examples of such editingtools. Dreamweaver is used for marked up web documents that supports the display andmanipulation of raw HTML text.
70
Structure editors for marked-up text like HTML are useful in browsing via document. The useof structure specifies the requirements of editing.
7.9 Summary
A number of editors are available based on how editing is performed and output generated bythem. The major functions in editing are travelling, editing, viewing and display. Each editingcommand in a line editor is applied to one or more lines of text designated by the user. A streameditor is used to view the entire text as a sequence of characters. A stream editor performs thetext transformations on an input stream. That input stream may be a file or input from a pipeline.Screen editors allows you to edit the text on the (display) screen by moving the cursor to thespecific location. A word processor is an advanced form of an editor. It is also popularlycategorized as computer software application. It is capable of performing editing, formatting,insertion of non-textual objects and also printing of documents. Structure editor is used withdifferent programming languages such as C++, or HTML. It aids in maintaining particularstructure/syntax used by a specified language. Some advanced structure editors also providehints in term of name of the method that may follow a given object.
7.10 Glossary
Editor- Editors are used to draft or modify documents.
Stream- It refers to a sequence of characters.
sed- It is an example of editor used in UNIX/LINUX operating system.
Buffer- It refers to a temporary storage area.
Marked-up text- It is used to provide instructions to a web browser about the lookand working of a web page.
7.11 Answers to check your progress
1. Screen editor.
2. Editing, travelling, display and viewing are major functions of an editor.
3. Teleprompter.
4. Stream.
5. The two buffers maintained by sed are active pattern space and auxiliary hold space.
6. TRUE.
7.12 References/Suggested Readings
1. Systems Programming by John J Donovan, Tata McGraw-Hill
71
2. Systems Programming by D M Dhamdhere, Tata McGraw-Hill Education
3. Systems Programming Charanjeet Singh Kalyani Publications
7.13 Model Questions
1. What is the role of an editor?
2. Explain the design of simple editor.
3. How a stream editor is different from line editor?
4. Explain the benefits of using structure editor.
72
Lesson-8 Fundamentals of Compiler Design
Structure of lesson8.0 Objective8.1 Introduction to Compilers8.2 Introduction to Translators
8.2.1 Various Types of Translators8.3Interpreters
8.3.1 Self Interpreters
8.3.2 Just in time complication
8.3.3 Byte code interpreters
8.3.5 Abstract syntax tree interpreters
8.3.5 Pros and cons of Interpreter
8.4 Debuggers8.3.1 Features of Debugging
8.3.2 How to Debug the Code8.5 Bootstrapping For Compilers8.6 Summery8.7 Glossary8.8 Answers to check your progress/self-assessment questions.
8.9 References
8.10 Model questions
8.0 ObjectiveAfter reading this chapter the students will able to:
Discuss the need and processes of various translators.
Explain various translators are different from each other due to their requirement and process of
implementation.
Explain the concept of interpreters and its types.
Evaluate the performance of debugger and compiler and their importance
8.1 Introduction to CompilersProgramming for right on time PCs was principally composed in low level computing construct. In spiteof the fact that the first abnormal state dialect is almost as old as the first PC, the constrained memorylimit of right on time PCs prompted significant specialized difficulties when the first compilers weredesigned. The main abnormal state programming language (Plankalkül) was proposed by Konrad Zuse in1943. The primary compiler was composed by Grace Hopper, in 1952, for the A-0 programming dialect;the A-0 worked more as a loader or linker than the present day idea of a compiler. The main autocode andits compiler were produced by Alick Glennie in 1952 for the Mark 1 PC at the University of Manchesterand are considered by some to be the initially aggregated programming dialect. The FORTRAN group
73
drove by John Backus at IBM is for the most part credited as having presented the first finish compiler in1957. COBOL was an early dialect to be gathered on various architectures, in 1960.In numerousapplication spaces the thought of utilizing a more elevated amount dialect rapidly got on. As a result ofthe extending usefulness upheld by more up to date programming dialects and the expanding intricacy ofPC architectures, compilers have turned out to be more minds boggling. Early compilers were composedin low level computing construct. The principal self-facilitating compiler – equipped for accumulating itsown particular source code in an abnormal state dialect – was made in 1962 for LISP by Tim Hart andMike Levin at MIT. Since the 1970s it has gotten to be regular practice to execute a compiler in thedialect it orders, albeit both Pascal and C have been mainstream decisions for usage dialect. Building aself-facilitating compiler is a bootstrapping issue—the first such compiler for a dialect must be gatheredeither by hand or by a compiler written in an alternate dialect, or (as in Hart and Levin's Lisp compiler)assembled by running the compiler in a Tran.A compiler is an uncommon project that proceduresarticulations written in a specific programming dialect and transforms them into machine dialect or"code" that a PC's processor employments. Commonly, a software engineer composes dialectproclamations in a dialect, for example, Pascal or C one line at once utilizing a supervisor. The record thatis made contains what are known as the source proclamations. The software engineer then runs the properdialect compiler, indicating the name of the record that contains the source proclamations. At the pointwhen executing (running), the compiler first parses (or breaks down) the greater part of the dialectarticulations grammatically in a steady progression and after that, in one or more progressive stages or"passes", fabricates the yield code, verifying that announcements that allude to different explanations arealluded to effectively in the last code. Generally, the yield of the accumulation has been called item codeor in some cases an article module. (Note that the expression "object" here is not identified with articlesituated programming.) The item code is machine code that the processor can transform or "execute" onedirection at once. All the more as of late, the Java programming dialect, a dialect utilized as a part of itemarranged programming, has presented the likelihood of ordering yield (called byte code ) that can keeprunning on any PC framework stage for which a Java virtual machine or byte code translator is given tochange over the byte code into directions that can be executed by the genuine equipment processor.Utilizing this virtual machine, the byte code can alternatively be recompiled at the execution stage by awithout a moment to spare compiler.Generally in some working frameworks, an extra step was needed after gathering - that of determining therelative area of directions and information when more than one article module was to be keep running inthe meantime and they cross-alluded to one another's guideline successions or information. Thisprocedure was some of the time called linkage altering and the yield known as a heap module.A compiler works with what are here and there called 3GL and larger amount dialects. A constructingagent chips away at projects composed utilizing a processor's constructing agent dialect. (See Fig.8.1) Itacquires input as a source program usually written in a high-level language and produces a comparabletarget program usually in assembly or machine language. The main part of this translation process is thatthe compiler reports its user the existence of errors in the source program.
Fig.8.1. A compiler
74
Two parts are important for executing a program in HLL programming language. The source programmust initial be compiled and translated into an object program (See Fig. 8.2(a)) Next the results of objectprogram are overloaded into a memory execute. (See Fig. 8.2(b))
Fig.8.2. (a) Compilation Process
Fig.8.2. (b) Processing of Object Program
The compiler has to go through many processes before achieving its main programming language. Theseprocesses may be named as phases. The diagram below is showing the different phases of the compiler
Figure 8.2 (c) Phases of Compiler
75
8.2 Introduction to TranslatorsA translator is a computer program that takes as input a program written in one language and produces asoutput of a program in another language. The translator performs another very important process i.e. errordetection. Any destruction of the HLL requirement would be detected and report to the programmers. Thesignificant role of translator is to translating the HLL program input into an equivalent ML program andproviding investigative messages whenever the programmer violates specification of the HLL.
8.2.1 Various types of translatorsThere are various types of translators described below:
a) Assembler
b) Compiler
c) Interpreter
d) Decompiler
e) Disassembler
a) Assembler: Assembler is a computer program which is used to translate program written in assembly
language into a machine language. The translated program is called as object program. Assembler
checks each instruction for its correctness and generates diagnostic messages, if there are mistakes in
the program (See Fig.8.3). It translates mnemonic operation codes to their machine language
equivalents. Assigning machine addresses to symbolic labels. Assembler pseudo-instructions provide
instructions to assembler itself. They are not translated into machine instructions such as Start or End.
The output of assembler program is called the object code or object program. The object code is
usually a machine code, also called a machine language, which can be understood directly by a
specific type of CPU (Central Processing Unit), such as PowerPC.
Fast assembler (FASM) is one of the examples of assembler.
76
Fig.8.3 Assembly Process
Compilers are designed to convert source code into an assembly language or some anotherprogramming language. An assembly language is a human-readable notation for the machine languagethat a specific type of CPU uses.
b) Compiler: A compiler is a program that translates a program written in HLL to execute a machine
language. The process of transferring HLL into object code is lengthy and complex process as
compared to assemblers. Compilers have diagnostic capabilities and prompt the programmer with
appropriate error message while compiling a HLL program (See Fig 8.4).The process is repeated until
the program is mistake free and translated to an object code. Compilers also have the ability of linking
subroutine of the program. Example of a compiler is Microsoft Visual Studio.
Fig.8.4 Compiler
c) Interpreter: In software engineering, a interpreter is a system program that straightforwardly
executes, i.e. performs, directions written in a programming or scripting dialect, without already
ordering them into a machine dialect program. A translator by and large uses one of the accompanying
procedures for project execution: parse the source code and perform its conduct straightforwardly
make an interpretation of source code into some proficient middle of the road representation and
instantly execute this unequivocally execute put away precompiled code made by a compiler which is
a piece of the mediator framework
Early forms of the LISP programming dialect and Dartmouth BASIC would be samples of the firstsort. Perl, Python, MATLAB, and Ruby are samples of the second, while UCSD Pascal is a case of the
76
Fig.8.3 Assembly Process
Compilers are designed to convert source code into an assembly language or some anotherprogramming language. An assembly language is a human-readable notation for the machine languagethat a specific type of CPU uses.
b) Compiler: A compiler is a program that translates a program written in HLL to execute a machine
language. The process of transferring HLL into object code is lengthy and complex process as
compared to assemblers. Compilers have diagnostic capabilities and prompt the programmer with
appropriate error message while compiling a HLL program (See Fig 8.4).The process is repeated until
the program is mistake free and translated to an object code. Compilers also have the ability of linking
subroutine of the program. Example of a compiler is Microsoft Visual Studio.
Fig.8.4 Compiler
c) Interpreter: In software engineering, a interpreter is a system program that straightforwardly
executes, i.e. performs, directions written in a programming or scripting dialect, without already
ordering them into a machine dialect program. A translator by and large uses one of the accompanying
procedures for project execution: parse the source code and perform its conduct straightforwardly
make an interpretation of source code into some proficient middle of the road representation and
instantly execute this unequivocally execute put away precompiled code made by a compiler which is
a piece of the mediator framework
Early forms of the LISP programming dialect and Dartmouth BASIC would be samples of the firstsort. Perl, Python, MATLAB, and Ruby are samples of the second, while UCSD Pascal is a case of the
76
Fig.8.3 Assembly Process
Compilers are designed to convert source code into an assembly language or some anotherprogramming language. An assembly language is a human-readable notation for the machine languagethat a specific type of CPU uses.
b) Compiler: A compiler is a program that translates a program written in HLL to execute a machine
language. The process of transferring HLL into object code is lengthy and complex process as
compared to assemblers. Compilers have diagnostic capabilities and prompt the programmer with
appropriate error message while compiling a HLL program (See Fig 8.4).The process is repeated until
the program is mistake free and translated to an object code. Compilers also have the ability of linking
subroutine of the program. Example of a compiler is Microsoft Visual Studio.
Fig.8.4 Compiler
c) Interpreter: In software engineering, a interpreter is a system program that straightforwardly
executes, i.e. performs, directions written in a programming or scripting dialect, without already
ordering them into a machine dialect program. A translator by and large uses one of the accompanying
procedures for project execution: parse the source code and perform its conduct straightforwardly
make an interpretation of source code into some proficient middle of the road representation and
instantly execute this unequivocally execute put away precompiled code made by a compiler which is
a piece of the mediator framework
Early forms of the LISP programming dialect and Dartmouth BASIC would be samples of the firstsort. Perl, Python, MATLAB, and Ruby are samples of the second, while UCSD Pascal is a case of the
77
third sort. Source projects are assembled early and put away as machine free code, which is thenconnected at run-time and executed by a translator and/or compiler (for JIT frameworks). A fewframeworks, for example, Smalltalk, contemporary adaptations of BASIC, Java and others mightlikewise join two and three.While understanding and gathering are the two principles implies by which programming dialects areexecuted, they are not fundamentally unrelated, as most deciphering frameworks likewise performsome interpretation work, much the same as compilers. The expressions "deciphered dialect" or"gathered dialect" mean that the standard execution of that dialect is a translator or a compiler,individually. An abnormal state dialect is in a perfect world a reflection free of specific usage.
Fig.8.5 Interpreter
d) Decompiler: Decompiler is a computer program that performs repeal process of compiler. Decompiler
normally applied to a program that translates executable programs means the output from a compiler
into source code in a high level language. It is used for the recovery of missing source code and is also
useful forcomputer security, interoperability and error correction.
e) Disassembler: Adisassembler is a computer program that translates machine language into assembly
language which is the inverse process an assembler. A disassembler is different from a decompiler
which targets a high level language rather than an assembly language.
8.2 Interpreters
A PC is a coordinated accumulation of equipment segments, which are able to do executing guidelines
called "object code" put away in a PC's memory segment. The PC's control segment takes the article code
put away as a string of two fold bits ( i.e.0's and 1's), proselytes the bits to voltage levels, and transmits
the voltage levels to its equipment parts which carry out, or execute, operations as coordinated by the
voltages, as determined in the item code. The progressions of transformation, transmission, and execution
by equipment segments are called interpretation. An interpreter is likewise an translator, yet rather than
making an interpretation of the source project to a target program, it translates every source teach particle
and produces the outcomes e.g. the duplication result (rather than the objective directions to process the
augmentation result, which is the thing that a compiler would produce).There is no spared interpretation
78
or target program, just the aftereffects of the reckonings. This accepts that as the interpreter is breaking
down the source program it can deliver the suitable double code for the voltages that direct the operation
of the equipment parts. An interpreter is quicker than a compiler in light of the fact that it has less stages
e.g. no streamlining stage and it delivers the consequences of the processing is as it deciphers the source
instructions. An interpreter is too less difficult than a compiler, however the calculations determined in
the source system are done less proficiently. Underneath we will study the different Distinctions of
Interpreters
8.3.1 Self- interpreters
A self-interpreter is a programming dialect interpreter written in a programming dialect which can
translate itself; a sample is a BASIC interpreter written in BASIC. Self-interpreters are identified with
self-facilitating compilers. If no compiler exists for the dialect to be deciphered, making a self-interpreter
requires the execution of the dialect in a host dialect (which may be another programming dialect or
constructing agent). By having a first interpreter, for example, this, the framework is bootstrapped and
new forms of the interpreter can be produced in the dialect itself. It was along these lines that Donald
Knuth built up the TANGLE interpreter for the dialect WEB of the mechanical standard TeX typesetting
system. Defining a code is generally done in connection to a unique machine (alleged operational
semantics) or as a numerical capacity (denotational semantics). A dialect might likewise be characterized
by an interpreter in which the semantics of the host dialect is given. The meaning of a dialect by a self-
interpreter is not all around established (it can't characterize a dialect), but rather a self-interpreter
educates a pursuer regarding the expressiveness and polish of a dialect. It additionally empowers the
interpreter to decipher its source code, the initial move towards intelligent interpreting.
8.3.2 Just in time complication
Further obscuring the refinement between interpreters, byte-code interpreters and assemblage is in the
nick of time gathering (JIT), a system in which the middle representation is arranged to local machine
code at runtime. This gives the effectiveness of running local code, at the expense of startup time and
expanded memory use when the byte code or AST is initially aggregated. Versatile enhancement is a
corresponding method in which the interpreter profiles the running program and aggregates its most every
now and again executed parts into local code. Both methods are a couple of decades old, showing up in
dialects, for example, Smalltalk in the 1980s. In the nick of time assemblage has picked up standard
consideration amongst dialect implementers lately, with Java, the .NET Framework and most present day
JavaScript usage now including JITs.
79
8.3.3 Byte code interpreters
There is a range of conceivable outcomes in the middle of interpreting and aggregating, contingent upon
the measure of investigation performed before the project is executed. For instance, Emacs LISP is
assembled to byte code, which is an exceedingly packed and enhanced representation of the Lisp source,
however is not machine code (and hence not attached to any specific equipment). This "assembled" code
is then deciphered by a bytecode interpreter (itself written in C). The accumulated code for this situation
is machine code for a virtual machine, which is actualized not in equipment, but rather in the byte code
interpreter. The same methodology is utilized with the Forth code utilized as a part of Open Firmware
frameworks: the source dialect is arranged into "F code" (a byte code), which is then deciphered by a
virtual machine.
8.3.4 Abstract syntax tree interpreters
In the range in the middle of interpreting and aggregating, another methodology is to change the source
code into an improved conceptual language structure tree (AST), then execute the project taking after this
tree structure, or utilization it to produce local code in the nick of time. In this approach, every sentence
should be parsed just once. As preference over byte code, the AST keeps the worldwide system structure
and relations between articulations (which is lost in a byte code representation), and when packed gives a
more reduced representation. Thus, utilizing AST has been proposed as a superior moderate arrangement
for without a moment to spare compilers than byte code. Additionally, it permits the framework to
perform better investigation amid runtime.
8.3.5 Pros and Cons of InterpreterIn this pros and cons of Interpreter are discussed below.
Pros of Interpreter When execution speed is not essential then interpreters are valuable for the program development.
Compilation stage is not so needed because execution of process can be completed into a single stage.
During runtime it is possible to modify the codes.
It is actually helpful for debugging the codes since source code execution can be analyze in an IDE
(Integrated Development Environment).
It also facilitates interactive code development.
Cons of Interpreter In a program with a loop same statement will be translated every time it is encountered. Therefore,
the interpreter programs are usually slower in execution then compile programs.
80
Translation has to finish every time when the program is running because no object code is produced.
Check your progress/self-assessment questions
Q.1 what is the difference between compiler and Interpreter?
_____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
Q.2 What are the phases of compiler. Name them.
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
_____________________________________________________________________________
Q.3 Write down the disadvantages types of Interpreters?
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
____________________________________________________________________________
Exercise 1
a. Explain different phases of compiler?
b. What is the need of compilers?
c. Explain different types of interpreters?
d. Explain translators and their different types with example?
8.4 Debuggers
81
A debugger or investigating device is a PC program that is utilized to test and troubleshoot differentprojects (the "objective" system). The code to be analyzed may then again be running on a guideline settest system (ISS), a method that permits awesome force in its capacity to end when particular conditionsare experienced however which will commonly be fairly slower than executing the code specifically onthe proper (or the same) processor. A few debuggers offer two methods of operation—full or incompleterecreation—to utmost this effect.A "trap" happens when the system can't ordinarily proceed with as a result of a programming bug orinvalid information. For instance, the project may have attempted to utilize a guideline not accessible onthe present form of the CPU or endeavored to get to occupied or ensured memory. At the point when theproject "traps" or achieves a preset condition, the debugger normally demonstrates the area in the firstcode in the event that it is a source-level debugger or typical debugger, ordinarily now seen inincorporated improvement situations. In the event that it is a low-level debugger or a machine-dialectdebugger it demonstrates the line in the dismantling (unless it additionally has online access to the firstsource code and can show the fitting area of code from the gathering or assemble.
8.4.1 Features of DebuggerOrdinarily, debuggers offer a question processor, image resolver, expression translator, and investigatebolster interface at its top level. Debuggers additionally offer more complex capacities, for example,running a system orderly (single-venturing or project liveliness), ceasing (breaking) (stopping the projectto look at the present state) at some occasion or determined direction by method for a breakpoint, andfollowing the estimations of variables. Some debuggers can alter system state while it is running. It mightlikewise be conceivable to proceed with execution at an alternate area in the system to sidestep anaccident or legitimate slip.The same usefulness which makes a debugger valuable for killing bugs permits it to be utilized as aproduct splitting instrument to avoid duplicate security, advanced rights administration, and otherprogramming assurance highlights. It frequently likewise makes it valuable as a general check device,deficiency scope, and execution analyzer, particularly if direction way lengths are shown. Most standardtroubleshooting motors, for example, gdb and dbx, give comfort based summon line interfaces. Debuggerfront-finishes are well known augmentations to debugger motors that give IDE joining, systemmovement, and representation highlights. Some early centralized server debuggers, for example, Oliverand SIMON gave this same usefulness to the IBM System/360 and later working frameworks, as longprior as the 1970s.
8.3.2How to debug the codeThe process of debugging is shown in Fig.8.6. There are major four processes in debugging. First of allthe location of the error is found out. Further the design layout of the error repairmen is determined. In thenext step the error is repaired by using suitable approach. At the end the program is re-tested for thefurther processing.
Fig.8.6. Debugging Process
82
There are two types of debugger. Console mode debuggers and graphical or visual debuggers. Consoledebuggers are usually a part of language, i.e.,they are included in standard libraries of language. For aconsole debugger the user interface is the keyboard and console mode window. During the execution ofthe program the lines of the source code pass to the console window. Visual debuggers are components ofmulti-featured IDE (interactive development environment). It is powerful and easy to use debugger. Thevisual debugger provides the same user interface like the graphical text editor. The special margin area isprovided to the left for breakpoint symbols, current line pointer etc.The various features of the debuggers have been shown below:-
Debuggers help to provide breakpoints in source code.
They enable a single step mode.
With the debuggers it is possible to view memory of process.
They provide the functionality to change values at run time.
It is possible to view CPU registers.
It is easy to disassemble instructions.
They load memory dump after a crash.
It is used for server processes.
GDBand WinDbg are two popular debuggers. GDB is the GNU project debugger which allowsprogrammer to see the internal working of other program. GDB mostly runs on Microsoft windowvariants and UNIX. WinDbg is multipurpose debugger provided for Microsoft windows. It helps to debugthe user applications, device drivers and operating system in kernel mode.
8.5 Bootstrapping For CompilersBootstrapping is a process of compiler writing in the source language which is expected to be compiled. Itis also known as self hosting complier. The notation used to represent it is T-diagram. For constructionthree languages are essential. The T notation is shown in the Fig 8.7.
Fig 8.7 T diagram
i. Source language (S):- It is complied by newly written complier.
ii. Implementation language: - It is required for new complier writing.
iii. Target language: - It is generated by new complier.
Suppose we have to create a new complier in a new language ‘A’ which uses implementation language ‘I’as a source language and produces code for ‘B’ using same implementation language ‘B’. In this bothcompilers are represented as follows:-
83
AiZ IbB
To create a new complier ABZ the bootstrapping use as:
AiZ + IbB = AbZBootstrapping has following advantages:-
It is a type of non-trivial test for the language.
There is only need for the language being complied.
The development of compiler can also be done in high level language.
The various improvements in the back end of compiler improve the general programs as well as
compiler itself.
The various compilers for different programming languages are bootstrapped including compilers forJava, Lisp, Python, Pascal, ALGOL and C etc.
8.6 SummaryIn this chapter the role of various translators like complier, interpreter, assembler and disassembler alongwith their pros and cons has been explained. The working of debuggers along with detail process also hasbeen explained. The bootstrapping for compliers also has been discussed in detail.
8.7 GlossaryCompiler: A compiler is a program that translates a program written in high level language to execute amachine language.Translator: A translator is a computer program that takes as input a program written in one language andproduces as output of a program in another language.
84
Interpreter: It is a computer program that directly executes instructions written in a programminglanguage before compiling them into a machine language program.Assembler: Assembler is a computer program which is used to translate program written in assemblylanguage into a machine language.Debugger: A program which is used for the testing of the other programs.Bootstrapping: A bootstrapping is a process of writing a compiler in the source language which isexpected to be compiled.
Check your progress/self-assessment questions
Q.4 Define debugger?
_______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________Q. 5 write the features of Debugger?
_____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
Exercise 2
a. What is the role of a debugger in compiler?b. Explain the process of bootstrapping in compiler?c. Give advantages and disadvantages of debugger?d. What is the process of debugging the code?
8.8 Answers Check your progress/self-assessment questions
1.
No Compiler Interpreter
1 Compiler Takes Entire program as inputInterpreter Takes Single instruction asinput .
2 Intermediate Object Code is GeneratedNo Intermediate Object Code isGenerated
3Conditional Control Statements areExecutes faster
Conditional Control Statements areExecutes slower
85
2. 1. Lexical Analyzer2. Syntax Analyzer3. Semantic Analyzer4. Machine independent code Optimizer5. Code Generator6. Machine Dependent Code Optimizer
3. a. Interpreter programs are usually slower in execution then compile programs.b. Translation has to finish every time when the program is running because no object code isproduced.
4. A debugger or investigating device is a PC program that is utilized to test and troubleshoot differentprojects (the "objective" system).
5. a. Debuggers offer a query processor, symbol resolver, expression interpreter, and debug supportinterface at its top levelb. Debuggers also offer more sophisticated functions such as running a program step by step8.9 References/Suggested Readings1. System programming, John J Donovan, Tata McGraw-Hill Education, 1st Edition.
2. Systems Programming and Operating System, D M Dhamdhere, Tata McGraw-Hill
Education, 1st Edition.3. System programming, A.A. Puntambekar, Technical publications, 3rd Edition.
8.10 Model Questions1. Explain various phases of Compilers.2. Explain various types of translators.3. Explain the concept of Bootstrapping.
86
Lesson-9 Finite automata and grammar
Structure of lesson
9.0 Objective
9.1 Introduction
9.2 Deterministic finite automata
9.2.1 Formal definition
9.2.2 String processing in DFA
9.3 Non deterministic finite automata
9.3.1 Formal definition
9.3.2 String processing in NDFA
9.4 Differences between DFA and NDFA
9.5 Equivalence of DFA and NDFA
9.5.1 Conversion of NDFA to DFA
9.5.2 Conversion of NDFA to DFA with the help of lazy creation method
9.6 Grammars
9.6.1 Formal Definition
9.7 Types of Grammars
9.8 Regular Grammar
9.9 Context Free Grammar
9.10 Context Sensitive Language
9.11 Unrestricted Grammars
9.12 Summary
9.13 Glossary
9.14 Answers to check your progress/self-assessment questions.
9.15 Model questions
9.16 References
9.0 Objective
After studying this lesson students will able to:
Explain Finite automata
87
Differentiate between deterministic finite automata and non-deterministic finite automata.
Explain String processing in DFA and NDFA.
Explain Equivalence of DFA and NDFA.
Discuss grammar and types of grammar.
9.1 Introduction to Finite automata
Finite-state automaton or just a state machine is a numerical model of calculation used to plan
both PC programs and consecutive rationale circuits. It is considered as a conceptual machine
that can be in one of a finite number of states. The machine is in standout state at once; the state
it is in at any given time is known as the current state. It can transform starting with one state,
then onto the next one started by an activating occasion or condition which is known as a
transition. A specific FSM is characterized by a rundown of its states, and the activating
condition for every transition.
The conduct of state machines can be seen in numerous gadgets in cutting edge society which
performs a foreordained grouping of activities relying upon a succession of occasions with which
they are exhibited. Basic samples are candy machines which administer items when the best
possible blend of coins is saved, lifts which drop riders off at upper floors before going down,
activity lights which change grouping when autos are holding up, and mix locks which oblige the
information on mix numbers in the best possible request. Finite-state machines can demonstrate
countless, among which is electronic outline robotization, correspondence, convention plan,
language parsing and other designing applications. In science and computerized reasoning
exploration, state machines or pecking order of state machines have been utilized to portray
neurological frameworks. In phonetics, they are utilized to depict basic parts of the sentence
structures of common dialects. Considered as a dynamic model of calculation, the finite state
machine is frail; it has less computational force than some different models of processing, for
example, the Turing machine. That is, there are undertakings which no FSM can do, yet some
Turing machines can. This is on account of the FSM memory is restricted by the quantity of
states. We can define automata as an abstract model of digital computer. Following figure shows
one essential feature of a general automation.
88
Figure 9.1 Finite automation machine
The above figure shows features of automata those are explained below:
Input: The very first step in the automation process is of taking inputs a producing output
respectively. The input given to the automata is a string over a given alphabet. Input tape
is used for giving inputs to the automata which is divided into cells which each of them
holds one symbol at time.
Output: The inputs given to the automata through the input tape helps in producing
various outputs.
States of automata: For an automation process to take place, thesystem has finite
number of internal states starting from q1 to qn.
State relation: States of automata are used to produce various outputs. Which internal
automata state to be used is determined by the present state and the present input.
Outputrelation: While in the automation system a next state after taking the input can be
the same present state of new state depending on the input symbols used.
9.2 Deterministic finite automata (DFA)
89
Deterministic finite automaton (DFA) otherwise called deterministic finite state machine is a
finite state machine that acknowledges/rejects finite series of strings and just creates an
exception processing (or keep running) of the automaton for every information string.
Figure 9.2 Informal deterministic finite automata
9.2.1 Formal definition
A deterministic finite automata consists of total 5 tuples (Q, Σ, δ, q0, F):
Q->Finite set of states.
Σ->Finite set of input symbols.
(δ : Q × Σ → Q)->Transition function.
(q0 ∈ Q)->Start state.
(F ⊆ Q)->Accepter states.
A deterministic finite automaton consists of 5 tuples explained above. The very first tuple
element of deterministic finite automata is the finite number of states (Q) that make it finite
reachable. The second element is the Σ which is the finite set of input symbols. The third tuple
considered to be the most important one. It is called transition function and written as (δ: Q × Σ
→ Q).The next tuple is the starting state q0from where the initial start of the machine takes place.
It is one the states of finite set of states discussed above and written as q0 ∈ Q. The last of all the
tuple is the acceptor states written as F also known as the final state of the finite automation
machine where the machine stops working. It is also written as F ⊆ Q as F is also one of the state
of finite set of states.
90
9.2.2 String processing by DFA
The model of deterministic finite automata explained below is to put light on the way how
strings are processed by DFA during the automation process:
Figure 9.3 String processing by DFA
1. Input tape: Input tape is the basic step of the automation process through which inputs are
given to the system for producing various outputs. This input tape is divided into a number of
cells which contain single symbol from input alphabet Σ.The length of the input tape is
determined with the help of left and right end markers (¢,$), the absence of which tape’s length
is considered to be infinite. The between end markers is processed.
2. Reading head: The reading head in model above is used for reading one symbol at a time
from the cells of the input tape.
3. Finite control: Finite control is responsible for taking symbols under the reading head as
input and produces output as a new state by moving the reading head along the tape to the next
cell.
Example 1: Draw a DFA for string containing 01.
91
Q= {q0,q1,q2}
Σ= {0,1}
Start state=q0
Final state F= {q2}
Transition table of string
containing 01 of the above given
figure
Δ 0 1
->q0 q1 q0
q1 q1 q2
q2* q2 q2
9.3 Non-deterministic finite automata (NDFA)
A nondeterministic finite automaton (NFA), or nondeterministic finite state machine, does not
have to comply with these limitations. Specifically, every DFA is additionally a NFA. Utilizing
the subset development calculation, each NFA can be meant an equal DFA, i.e. a DFA
perceiving the same formal language. Like DFA’s, NFA’s just perceive regular languages. Some
of the time the term NFA is utilized as a part of a smaller sense, importance an automaton that
appropriately abuses an above confinement i.e. it is not a DFA.
Figure 9.4: Informal Non-deterministic finite automata.
92
9.3.1 Formal definition
Non-deterministic finite automata are “non-deterministic” implying that the machine can exist in
more than one state at the same time. In NDFA the outgoing transitions could be non-
deterministic.
A Non-deterministic Finite Automaton (NFA ) consists of 5 tuples {Q , ∑ ,q, F, δ }:
Q -> a finite set of states
∑ ->a finite set of input symbols (alphabet)
q0 -> a start state
F -> set of final states
δ -> a transition function, which is a mapping between Q x ∑ ==> subset of Q.
9.3.2 String processing in NDFA
The string processing in NDFA is totally different from DFA.For a given input in NDFA there
can be more than one legal sequence of steps that means once a processing state receives the
input symbol it have more than one choice to move further which makes it a non-determinism.
The same result can be achieved by computing all legal sequences in parallel and then
determistically search the legal sequences that accept input but it contradicts to the statement that
it doesn’t directly corresponding to anything in physical computer systems. Following figures
shows the deterministic and non-deterministic computations with branches:
93
Figure 9.5 DFA and NDFA computations with branches
Example 2: Draw an NDFA for string containing 01.
Q= {q0,q1,q2}
∑={0,1}
Start state=q0
Final state F={q2}
94
Transition table of string containing 01 of the above given figure
9.4Differences between DFA and NDFA
DFA NDFA
DFA stands for deterministic finite automata. NDFA stands for non-deterministic finite
automata.
Each transition leads to only one state that
makes the whole transition scenario
deterministic.
A transition could lead to a subset of states
making the transition scenario non-
deterministic.
DFA doesn’t have the ability to use empty
string transition.
NDFA has the ability to use the empty string
transition.
Accepts input if the last state is in F.i.e. Final
state.
Accepts input if one of the last states is in F.
As per memory usage DFA requires a lot
memory space.
As per memory usage NDFA requires lesser
memory space.
The real timely implementation of DFA can be
done easily.
The real timely implementation of NDFA can't
be done without converting NDFA to DFA.
Backtracking is allowed in DFA. In NDFA, backtracking may or may not be
allowed depending on the problem statement..
δ 0 1
->q0 { q0,q1 } {q0 }
q1 ɸ {q2 }
*q2 {q2} {q2}
95
9.5 Equivalence of DFA and NDFA
Since reading above gives us an idea that both DFA and NDFA recognizes same class of
languages which is known as regular language’s which we will cover in next section. The
equivalence of DFA and NDFA is very useful because designing an NDFA is much easierthen
that of DFA which not only consumes a lot time but memory of the system too. For every non
deterministic finite automata, there exist an equivalent deterministic finite automata.
The equivalence between DFA and NDFA can be done by simulating the moves of NFA in
parallel. Every state of DFA will be represented by some subset of set of states in NDFA .If the
NDFA contains the number of states as n then the equivalent DFA will contain 2n states.
Theorem: A language L is accepted by a DFA if and only if it is accepted by an NDFA.
Proof: Given any NFA N, we can construct a DFA D such that L (N) =L (D).
Construction of a DFA from an NFA can be done by:
Observation: the transition function of an NFA the transition function of an NFA maps
to subsets of states
Idea: Make one DFA state for every possible subset of the NFA states 22 subset of the
NFA states.
Let N = {Q N,∑,δN,q 0,F N }
Goal: Build D= { Q D,∑,δ D,{q 0}, F D { Q } s.t. D,∑, D,{q 0}, D } L(D)=L(N)
Construction:
1. QD= all subsets of QN (i.e., power set)
2. FD=set of subsets S of QNs.t. S ∩ FN≠Φ
3. δD: for each subset S of QN and for each input symbol a in ∑:
δD(S,a) = U δN(p a)
p in s
9.5.1 Conversion of NDFA to DFA
The transition from one state to another state is not deterministic in the case of non-deterministic
finite automata i.e. for a particular symbol there may be more than one move leading to different
states. Here we will discuss how the conversion from NDFA to DFA takes place:
96
9.5.2 Conversion of NDFA to DFA with the help of lazy creation method
Check Your Progress/Self-Assessment Questions.
Q.1 Differentiate DFA and NDFA
97
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
___________________________________
Q.2 Write down the formal definition of NDFA
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
___________________________________
Q.3 Finite automata require minimum _______ number of stacks.
a) 1
b) 0
c) 2
9.6 Grammar
A set of production rules that are used for string processing in theory of computation is called
grammar, or formal grammar. The formation of the language's alphabet according to the syntax
required by any language is done with the help of these production rules. A formal sentence
structure is an arrangement of tenets for modifying strings, alongside a "begin symbol" from
which revamping begins. Thusly, a linguistic use is normally considered as a language generator.
Then again, it can likewise some of the time be utilized as the premise for a "recognizer", a
capacity in registering that figures out if a given string has a place with the dialect or is
linguistically off base. To depict such recognizers, formal dialect theory uses separate
formalisms, known as automata theory. One of the intriguing consequences of automata theory is
that it is unrealistic to plan a recognizer for certain formal languages. Parsing is the procedure of
98
perceiving an articulation (a breaking so as to string in normal languages) it down to an
arrangement of symbols and investigating every one against the syntax of the dialect. Most
languages have the implications of their articulations organized by language structure, a practice
known as compositional semantics.
9.6.1 Formal Definition
Formality a grammar is described with the help of 4 tuples proposed by the Noam Chomsky in
the year 1950’s.A grammar is declared with the variable G which consists of the following
components as:
G={V,Σ, P, σ }
Where
V- Is the set of terminal nodes.
Σ-Is the set of non terminal symbols with the restriction that V and Σ are disjoint.
σ- Start symbol.
P-Set of production rules.
i.e. A –> B
where:
A-Is the sequence of symbols having at least one non terminal.
B-Is the result of replacing some nonterminal symbolA with a sequence of symbols (possibly
empty)from V and Σ.
In order to understand the formal grammer we will take an example of sentitial form to which we
will try to derieve a valid form with a parse tree.
V = {“the”, ”a”, ”mouse”, ”pig”, ”saw”, “chased“}
= {S, NP, VP, D, N, V}
Where
S –Determine sentence D – Determine determiner
NP – Determine noun phrase N –Determine noun
VP – Determine verb phrase V – Determine verb
= S(Starting from the intial symbol) we have different production rules given underneath:
S –> NP VP,
99
NP –> D N,
VP –> V NP,
D –> ”the”, D –> “a”,
N –> ”mouse”, N –> ”pig”,
V –> “saw”, V –> “chased”
We will use left most derivation in order to form the sentence using production rules. The
leftmost derivation comprises:
S –> NP VP
S –> D N VP
S –> “the” N VP
S –> “the” “cat” VP
S –> “the” “cat” V NP
S –> “the” “cat”“chased” NP
S –> “the” “cat”“chased” D N
S –> “the” “cat”“chased”“a” N
S –> “the” “cat”“chased”“a”“dog”
Since the last production rule forms the required sentence for which different number of
production rules used. Production rules allow us to form the required sentence sequentially. We
can also use parse trees in order to understand the formation of the sentence which gives us a
clear picture. Underneath a parse tree is given for derivation of the sentence through the
production rules.
9.7 Types of Grammars
100
The grammars are used to form various sentences using various
production rules. According to the Chomsky hierarchy the grammar can be divided intodifferent types discussed are underneath:
Grammar Automata
Regular grammars Finite-automata
Context-free grammar Push-down automata
Context-sensitive grammar Linear-bounded automata
Unrestricted grammar Turing machine
9.8 Regular Grammar
Regular grammar is generally restriction based grammars where right side of the grammar is
restricted in which the right side may contain empty string or a single terminal symbol or single
terminal symbol followed by the non-terminal symbol. In regular grammars the left hand side of
the grammars contains only a single non terminal symbol.
For language {anbmIm,n≥1},which tells that value of a and b is such that a should be greater or
equal then 1 and b should be greater and equal to 1 also. This language is considered to be
101
regular language in comparison to the language {anbnIn≥1}. The grammar G used here where
N={S,A,B}, ={a,b} with S as he starting symbol and production rules as:
1. S->aA
2. A->aA
3. A->bB
4. B->bB
5. B->€
9.9 Context Free Grammar
The employment of an automaton is to perceive a dialect—to get a word as info, and answer the
yes-or-no inquiry of whether it is in the language. Be that as it may, we can likewise solicit what
kind of procedure we have to produce a dialect. In wording of human discourse,
acknowledgment relates to listening to a sentence and choosing whether or not it is linguistic,
while era relates to making up, and talking, our own linguistic sentence. A context-free grammar
is a model considered by the Chomsky school of formal linguistics. The idea is that sentences are
recursively generated from internal mental symbols through a series of production rules. For
instance, if S, N, V, and A corresponds to sentences, nouns, verbs, and adjectives, we might have
rules such as:
S → N V N,
N → AN,
and so on, and finally rules that replace these symbols with actual words. We call these rules
context-free because they can be applied regardless of the context of the symbol on the left-hand
side, i.e., independent of the neighboring symbols. Each sentence corresponds to a parse tree and
parsing the sentence allows us to understand what the speaker has in mind. Formally, a context-
free grammar G comprises of a limited letters in order V of variable symbols, a beginning
symbol S ∈V , a limited letters in order T of terminal symbols in which the last word must be
composed, and a limited set R of creation decides that let us supplant a solitary variable symbol
with a string made out of variables and terminals.
102
A → s where A ∈ V and s ∈ (V ∪ T) ∗We can say that the grammar G will generate language i.e.L(G) ⊆ T ∗ containing all terminal
words w.
If in give instance the grammar posses with s V = {S}, T = {(,),[,]} which give rise to language
D2 with two types of brackets:
S → (S)S,[S]S,Ԑ
The above production rule gives us the provision of using three production rule i.e. (S)S or [S]S ,
or can be replaced by the empty string symbol Ԑ.
A language generated from the context free grammar is called as a context free language.
A language {a nb n | n ≥ 0} is considered to be context free language as it can be generated from
the production rule S → aSb, Ԑ . Whereas the language of palindromes Lpal isconsidered to be
context free due to S → aSa,bSb,a,b, Ԑ production rule.
9.10 Context Sensitive Language
In a context-free grammar, the left-hand side of every creation tenet is a solitary variable symbol.
Context-free grammars permit creations to rely on upon neighboring variables, and in this
manner supplant one limited string with another. Then again, we request that the grammar is
non-contracting, in that creations never diminish the length of the string. Along these lines they
are of the structure:
u → v where u,v ∈ (V ∪ T) ∗ and |u| ≤ |v |.
On the off chance that we like, we can request that u contain no less than one variable. To create
the void word, we permit the standard S → Ԑ the length of S doesn't show up in the right-hand
side of any production rule. We say that a language is context-sensitive on the off chance that it
can be produced by a context-sensitive grammar. A context sensitive grammar for a copy
language can be written as V = {S,C,A, B}, T = {a,b}, and the production rules
103
S → aSA,bSB,C
C A → C a,a
C B → Cb,b
aA → Aa , a B → B a , bA → Ab , b B → Bb .
These standards let us produce a palindrome-like string, for example, abSBA. We then change S
to C and utilization C to change over capitalized variables A, B to lower-case terminals a,b,
moving the terminals to one side i.e. right to turn around the request of the second half. With the
remainder of these changes, we eradicate C, abandoning us with a word in Lcopy.
Case in point, here is the determination of abab:
S → aSA
→ abSBA
→ abC BA
→ abCbA
→ abCAb
→ abab .
9.11 Unrestricted Grammars
A grammar G= (N, Σ, P, S) where N denotes the non-terminal symbols, Σ is t terminal symbols,
P is the production rules used of the form of a->b where a and b are strings of the symbol in the
union of N and Σ with a as non empty string and last S to be the start symbol belonging to
N.From the name it suggests that it is kind of grammar where generally no restrictions are
implied on both sides of the production rules.It might be demonstrated that unlimited grammar
Table 1: Tabular form classification of types of grammars
104
TYPE NAME PRODUCTION RULES Recognizing automaton
/
The storage required /
Parsing complexity
1 Context
sensitive
grammars
aAz –>aBC…Dz
A – non-terminal symbols
a, z – sequences of zero or
more terminal or non-terminal
symbols
BC…D – any sequence of
terminal or non-terminal
symbols
Linear bounded
automaton
(non-deterministic
Turing machine) /
Tape being a linear
multiple of input length /
NP Complete
2 Context free
grammars
A –> BC…D
A – non-terminal symbols
BC…D – any sequence of
terminal or non-terminal
symbols
Pushdown automaton /
Pushdown stack /
O (n3)
3 Regular
grammars,
Finite state
grammars
A –>xB
C –> y
A, B, C – non-terminal
symbols
Finite state automaton /
Finite storage /
O (n)
105
uses the recursively enumerable languages. This is the same as saying that for each unhindered
syntax G there exists some Turing machine fit for perceiving L(G) and the other way around.
Given an unlimited grammar use, such a Turing machine is sufficiently straightforward to build,
as a two-tape nondeterministic Turing machine. The principal tape contains the information word
w to be tried, and the second tape is utilized by the machine to produce sentential structures from
G.
Generalization of various grammars is done according the capability of their parsing. Generally
the grammar lower in type is considered to be easy in parsing where as the number keeps on
increasing the complexity of the parsing or the generalization of the grammars keeps on difficult
in nature. In general the
regular grammars are considered to be good as the level of parsing complexity is very much low
which makes it to be top of all the other grammars followed by context free grammars then
context sensitive grammars and at the end the unrestricted grammars with highest parsing
x, y – terminal symbols
0 Unrestricted
grammars,
General rewrite
grammars
Allows the production rules to
transform any sequence of
symbols into any other
sequence of symbols.
Turing machine /
Infinite tape /
Undecidable
106
difficulty level. A description of the generalization of the various grammars is given in the figure
below which tells the difficulty level of the parsing ability of the grammars.
9.12 Summary
The machine is in standout state at once; the state it is in at any given time is known as the
current state. It can transform starting with one state, then onto the next one started by an
activating occasion or condition; this is known as a transition. A specific FSM is characterized
by a rundown of its states, and the activating condition for every transition. A set of production
rules that are used for string processing in theory of computation is called grammar, or formal
grammar. The formation of the language's alphabet according to the syntax required by any
language is done with the help of these production rules. A formal sentence structure is an
arrangement of tenets for modifying strings, alongside a "begin symbol" from which revamping
begins.
9.13 Glossary
Finite automata: Finite-state automaton (plural: automata), or just a state machine, is a
numerical model of calculation used to plan both PC programs and consecutive rationale circuits.
DFA: is a finite state machine that acknowledges/rejects finite series of strings and just creates
an exception processing (or keep running) of the automaton for every information string.
NDFA: Utilizing the subset development calculation, each NFA can be meant an equal DFA, i.e.
a DFA perceiving the same formal language. Like DFAs, NFAs just perceive regular languages.
Grammar: A set of production rules that are used for string processing in theory of computation
is called grammar, or formal grammar.
Regular grammar: Regular grammar is generally restricted based grammars where right side of
the grammar is restricted in which the right side may contain empty string or a single terminal
symbol or single terminal symbol followed by the non-terminal symbol.
Context-free grammar: A context-free grammar is a model considered by the Chomsky school
of formal linguistics. The idea is that sentences are recursively generated from internal mental
symbols through a series of production rules.
Context sensitive language: It is a language formed by context sensitive grammar in which the
variable on the left side does not show up on the right hand side of the production rules.
107
Unrestricted grammar: From the name it suggests that it is kind of grammar where generally
no restrictions are implied on both sides of the production rules.
Check Your Progress/Self-Assessment Questions.
Q.4 Define Grammar?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
____________________________
Q.5 what are the different types of Grammar?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
_____________________
9.14 Answers to check your progress/self-assessment questions.
1.
DFA NDFA
DFA stands for deterministic finite automata. NDFA stands for non-deterministic finite
automata.
Each transition leads to only one state that
makes the whole transition scenario
A transition could lead to a subset of states
making the transition scenario non-
108
deterministic. deterministic.
DFA doesn’t have the ability to use empty
string transition.
NDFA has the ability to use the empty string
transition.
2. Non-deterministic finite automata are “non-deterministic” implying that the machine can exist
in more than one state at the same time. In NDFA the outgoing transitions could be non-
deterministic.
A Non-deterministic Finite Automaton (NFA ) consists of 5 tuples {Q , ∑ ,q, F, δ }:
Q -> a finite set of states
∑ ->a finite set of input symbols (alphabet)
q0 -> a start state
F -> set of final states
δ -> a transition function, which is a mapping between Q x ∑ ==> subset of Q.
3. Option b
4. A set of production rules that are used for string processing in theory of computation is called
grammar, or formal grammar.
5. a)Regular grammar
b) Context-free grammar
c)Context sensitive language
d)Unrestricted grammar
9.15 Model questions
Q1.Given the language L = {ab, aa, baa}, which of the following strings are in L*?
1) abaabaaabaa
2) aaaabaaaa
109
3) baaaaabaaaab
4) baaaaabaa
Q2.Let w be any string of length n is {0,1}*. Let L be the set of all substrings of w. What is the
minimum number of states in a non-deterministic finite automaton that accepts L?
A n-1
B n
C n+1
D 2n-1
Q3.A minimum state deterministic finite automaton accepting the language L={W | W ε {0,1} *,
number of 0s and 1s in are divisible by 3 and 5, respectively} has
A 15 states
B 11 states
C 10 states
D 9 states
Q4. Given an arbitrary non-deterministic finite automaton (NFA) with N states, the maximum
number of states in an equivalent minimized DFA is at least
A N2
B 2N
C 2N
D N!
9.16 References/Suggested Readings
1. System programming, John J Donovan, Tata McGraw-Hill Education, 1st Edition.
2. Systems Programming and Operating System, D M Dhamdhere, Tata McGraw-Hill
Education, 1st Edition.
3. System programming, A.A. Puntambekar, Technical publications, 3rd Edition.
110
Lesson-10 Phases of Compiler Design
Structure of lesson
10.0 Objective
10.1 Major Parts of Compiler Design
10.2 Phases of Compiler Design
10.2.1 Lexical Analysis
10.2.2 Syntax Analysis
10.2.3 Semantic Analysis
10.2.4 Intermediate Code Generation
10.2.5 Code Generation
10.2.6 Code Optimization
10.3 Error Recovery
10.4 Symbol Table
10.5 Summary
10.6 Glossary
10.7 Answers to check your progress/self-assessment questions.
10.8 Answers to Check your progress/self-assessment questions.
10.9 Modal Questions
10.10 References
10.0 Objective
After reading this chapter the students will able to:
Explain the detailed process of compiler.
Explain various phases of compiler.
Implement each phase work with compilation process.
Explain optimization of code and their significance.
10.1 Major Parts of Compiler Design
The two major parts of compiler are Analysis phase and Synthesis phase. In analysis phase, an
intermediate representation is created from given source program. It is also known as front-end
111
of compiler. The parts of this phase are Lexical Analyzer, Syntax Analyzer, and Semantic
Analyzer.
Fig. 10.1 Parts of compiler
In phase of the intermediate representation creates equivalent target program It is also known as
back-end of compiler. The parts of this phase are Intermediate Code generator, Code Generator,
and Code Optimizer. In this, each phase transforms source program from one representation into
another. (See Fig.10.1) Also, they communicate with error handlers and symbol table.
10.2 Phases of Compiler Design
A compiler can have many phases and passes. In this, a pass of compiler is the traversal of a
compiler during the whole program and a phase of a compiler is an evident stage that takes input
from the preceding stage, processes it and produces output which can be use as input for the next
stage. Furthermore, a pass can have more than one phase. The compilation process is a series of
various phases in which each phase takes input from its preceding stage that having its own
depiction of source program and serves its output to the next phase of the compiler. The phases
of compiler are shown in figure below. (See Fig.10.2)
112
Fig. 10.2 Phases of compiler
10.2.1 Lexical Analysis
Lexical Analysis or scanning is an initial phase of a compiler. It takes the adapted source code
from language preprocessors which are written in the form of sentences. The lexical analyzer
divides these syntaxes into a sequence of tokens to remove any whitespace or comments in the
source code. If the lexical analyzer finds an invalid token, it generates an error. The lexical
analyzer works directly with the syntax analyzer. It read character streams from the source code
then checks for permissible tokens and passes the data to the syntax analyzer when it demands.
Tokens
113
Token is a sequence of characters that can be used as a single logical entity. In programming
language, keywords, constants, identifiers, strings, numbers, operators, and punctuations symbols
can be considered as tokens.
Specifications of Tokens
There are different types of tokens in compiler as shown below:
a) Alphabets
Any finite set of symbols {0,1} is a set of binary alphabets, or {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}
is a set of Hexadecimal alphabets, {a-z, A-Z} is a set of English language alphabets.
b) Strings
Any finite sequence of alphabets is called a string. Length of the string is the total number of
occurrence of alphabets. For example, A length of string true is 4 and is denoted by |true| = 4. A
string having no alphabets, i.e. a string of zero length is known as an empty string and is denoted
by ε (epsilon).
c) Special Symbols
A typical high-level language contains the following symbols:-
Arithmetic symbols: Addition (+), Subtraction (-), Division (/), Multiplication (*), Modulus
(%).
Punctuation: Comma (,), Semi-colon (;), Dot (.), Arrow (- >).
Assignment: =
Special Assignment: +=, /=, *=, - =
Comparison: = =, !=, <, <=, >, >=
Preprocessor: #
Location Specifier: &
Logical: &, &&, |, ||, !
Shift Operator: >>, >>>, <<, <<<
Lexemes
Lexemes are supposed to be a sequence of characters i.e. alphanumeric in a token. There are
several predefined regulations for every lexeme to be recognized as a valid token. These rules are
114
distinct by grammar rules by means of a pattern. A pattern explains what can be a token and
these patterns are defined by regular expressions.
Fig.10.3. Lexical Analyzer
Example: A: = B+10 tokens: A and B are identifiers
: = assignment operator
+ add operator
10 a number
A lexical analyzer puts information regarding identifiers into symbol table. Regular expressions
are used to describe tokens. A Deterministic Finite state Automata (DFA) is used in the
execution of lexical analyzer. A language is considered as a finite set of strings above finite set
of alphabets. Computer languages are consider as finite sets and set operations can be performed
on them mathematically. Finite languages can be described regular expressions. The lexical
analyzer requests to scan and identify only a finite set of valid string or token or lexeme that
belongs to the language. It also searches for the pattern definite by the language rules.
Regular expressions have the office to express limited languages by characterizing an example
for limited series of symbols. The sentence structure characterized by standard expressions is
known as general linguistic use. The dialect characterized by standard punctuation is known as
consistent language. Customary expression is a noteworthy documentation for indicate designs.
Every example coordinates an arrangement of strings because of which normal expressions give
names to an arrangement of strings. Programming dialect tokens can be depicted by standard
languages. General languages are straightforward and have productive usage.
A limited automata is a state machine that takes a series of symbols as information and changes
its state hence and limited automata are a perceived for normal expressions. At the point when a
consistent expression string is supply into limited automata, it changes its state for every strict.
115
On the off chance that the data string is effectively handled and the automata achieve its last state
then it is acknowledged i.e. the string without further feed was held to be a legitimate token of
the language. The scientific model of limited automata comprises of:
Q->Finite set of states.
Σ->Finite set of input symbols.
(δ : Q × Σ → Q)->Transition function.
(q0 ∈ Q)->Start state.
(F ⊆ Q)->Accepter states.
10.2.2 Syntax Analysis
Sentence structure investigation or parsing is the second phase of a compiler. A lexical analyzer
can recognize tokens with the assistance of consistent expressions and example rules. Be that as
it may, a lexical analyzer cannot verify the syntax of a specified sentence due to the limits of the
regular expressions. Regular expressions cannot verify balancing tokens such as parenthesis.
Thus, this phase use the context-free grammar (CFG) which is recognized by push-down
automata. CFG is a superset of Regular Grammar. It implies that every Regular Grammar is also
context-free but there exist several problems beyond the scope of Regular Grammar. CFG is a
useful tool to describe the syntax of programming languages.
There are total of 4 components that a context free grammar have:
Non-terminals set (V)
Terminal symbols or set of tokens (Σ)
Production rule (P)
Start symbol (S)
Syntax analyzers
A language structure analyzer or parser takes the info from a lexical analyzer as token streams.
The parser breaks down the source code i.e. token stream against the generation principles to
distinguish any blunders in the code. (See Fig. 10.4) The yield of this stage is a parse tree. A
parse tree is a graphical depiction of an induction. An induction is an arrangement of generation
principles to get the data string. It is suitable to perceive how strings are gotten from the begin
symbol. The begin symbol of the inference turns into the base of the parse tree. In a parse tree,
116
all leaf hubs are terminals, every single value inside hub is non-terminals and all together
traversal give unique data string. In parse tree, vagueness is vital. A sentence structure is said to
be questionable on the off chance that it has more than one parse tree (left or right deduction) for
no less than one string. On the off chance that information is examined and supplanted from left
to right, it is known as left-most induction. On the off chance that we check and supplant the
information with generation rules from right to left, it is known as right-most inference. In this,
the parser achieves two errands, for example, parsing the code, searching for lapses and
producing a parse tree as the yield of the stage. Parsers parse the whole code if several errors
survive in the program.
Fig. 10.4 Syntax Analyzer
Limitations of syntax analyzers
Syntax analyzers get their inputs as tokens from lexical analyzers. Lexical analyzers are in
charge of the legitimacy of a token supplied by the syntax analyzer. Syntax analyzers have the
following drawbacks:
If a token is substantial, it can't determine.
If a token is proclaimed before it is being utilized, it can't determine.
If a token is instated before it is being utilized, it can't determine.
If an operation performed on a token sort is substantial or not, it can't determine.
117
Syntax analyzers take after creation principles characterized by setting free linguistic use. The
way the creation standards are actualized (inference) partitions parsing into two sorts of parsing,
for example, top-down parsing and base up.
Fig. 10.5 Types of Parsing
Top-down parsing
When the parser starts constructing the parse tree from the start symbol and then tries to
transform the start symbol to the input, it is called top-down parsing.
Recursive descent parsing: It is a type of top-down parsing. It is called recursive because it
uses recursive actions to process the input. It suffers from backtracking.
Backtracking: If one derivation of a production fails, the syntax analyzer restarts the process
using unusual rules of similar production. This procedure may process the input string more
than one time to verify the right production.
Bottom-up parsing
Bottom-up parsing start from the leaf nodes of a tree and moving parts in upward direction until
it reach the root node.
Shift-Reduce Parsing: Shift-decrease parsing utilizes two exceptional strides for base up
parsing. These strides are known as movement step and decrease step. The movement step is
the headway of the information pointer to the following data symbol which is known as the
moved symbol. This symbol is pushed on the stack. The moved symbol is regarding as a
solitary hub of the parse tree. At the point when the parser discovers a complete syntax
118
principle (RHS) and replaces it to (LHS), it is perceived as decrease step. This happens when
the highest point of the stack contains a handle. A POP capacity is performed on the stack
which pops off the handle and replaces it with LHS non-terminal symbol.
LR Parser: The LR parser is a non-recursive, movement diminish base up parser. It utilizes
a broad class of connection free linguistic use which makes it fundamentally capable syntax
investigation strategy. LR parsers are otherwise called LR (k) parsers in which L remains for
left-to-right examining of the data stream, R remains for the formation of right-most
determination in converse and k indicates the quantity of look ahead symbols to decisions.
10.2.3 Semantic Analysis
Last phase analysis; how a parser builds parse trees in the syntax analysis phase. The parse tree
construct in that phase is usually not in use for a compiler when it does not carry any information
of how to calculate the tree. The production of context-free grammar that makes the rules of the
language does not contain how to understand them.
Semantics
Semantics of a language provide constructions similar to tokens and syntax structure. Semantics
facilitate interpret symbols, their types, and their relations among each other. Semantic analysis
analyze whether the syntax structure construct in the source program derive any importance or
not.
CFG + semantic rules = Syntax Directed Definitions
The subsequent responsibilities should be performed in semantic analysis are:
Type check
Array bound check
Scope resolution
Semantic Errors
Some of the semantic errors that analyze the semantic analyzer are recognize:
Mismatch of type.
Variable which is not declared (undeclared).
Misuse of reserved identifier.
119
More than once declaration of variable in a scope
Accessing a variable outside of its scope.
Mismatch of actual and formal parameter.
Attribute Grammar
Attribute grammar is an individual type of context-free grammar in which different included data
or attributes are attach to one or a greater amount of its non-terminals to give context-sensitive
data. Every property has very much characterized area of qualities, for example, integer, float,
character, string, and expressions. It is utilized to give semantics to the context-free grammar and
it can likewise help to recognize the syntax and semantics of a programming dialect. Quality
grammar showed up as a parse-tree that can pass qualities or data alongside the hubs of a tree.
Semantic attributes
Semantic attributes may be doled out to their qualities from their space at parsing time and assess
at the season of task or conditions. In view of the way the attributes gets their qualities, they can
be for the most part separated into two classes, for example, combined attributes and acquired
attributes.
Synthesized Attributes
These attributes get values from the attribute values of their child nodes.
Inherited Attributes
As compared to synthesized attributes, inherited attributes can take value from parent node or
siblings.
10.2.4 Intermediate Code Generation
Intermediate code generation is a final phase of the front end of the compiler. There is a need of
intermediate code generation before generation of target code. Without the intermediate
generation of code every machine requires native complier which is not practically possible.
Code optimization techniques can directly applied on intermediate code to improve the
performance. The intermediate code is very simple to be converted to assembly code. The input
to this phase is parse tree or abstract syntax tree.
120
Types of Intermediate Representation:
Syntax tree
Postfix notation
Three address Code
For the three address code generation the semantics rules are similar to syntax tree rules for
postfix notations.
Syntax tree
A syntax tree represents the real hierarchical structure of a input source program. A DAG
(Directed Acyclic Graph) provides the information but in a compact manner due to common sub
expressions are identified. A syntax tree for the statement a: =t*-s+t*-s appear in the Fig 10.5.
Fig 10.6 (a): syntax tree Fig10.6 (b): DAG
Postfix Notation
The postfix notation of a tree is linear representation of the syntax tree. It is list representation of
tree nodes in which each node appears immediately after its children nodes. The postfix notation
of the syntax tree in Fig 10.5 is as follows:
a t s uniminus * t s uniminus * + assign … (1.1)
In postfix notation the edges of the tree do not appear directly. The edges can be easily identified
from the order of the nodes and the number of operands.
121
Three address codes
Three address codes refer to the sequence of statements. Consider the expression below:
a: = b op c
Where a, b and c are the names, constants, temporary variables. op represents the operation to be
performed like boolean, arithmetic , logical etc. There is only one variable on the right hand side
of the statement.
Implementation of Three Address Statements
A three address statements represents the abstract view of intermediate code. For the operator
and the operand these statements can also be implemented as record with fields. There are three
such representations as:
Quadruples
Triples
Indirect Triples
A quadruple is four field record structures which are op, a1, a2, operation and result. The op field
specifies the code for the operator to be used. a1,a2 are the variables and result is used to hold
the result of various stages. Fig. 10.6 specifies the code quadruples for the assignment statement
given below:
a: = t*-s + t * -s …
10.1
Position op A1 A2 Result
(0) uniminus S T1
(1) * T T1 T2
(2) uniminus S T3
(3) * T T3 T4
(4) + T2 T4 T5
(5) : = T5 A
Fig 10.7 Quadruples
Here T1, T2…T5 are used to store the temporary results at different stages and final result is
stored in variable “a”.
122
Triples are used to avoid the temporary variables. Instead of storing results in temporary
variables position of the statement can also be used. The triple representation for the (10.1)
statement is shown in Fig. 10.7.
Position Op A1 A2
(0) uniminus S
(1) * T (0)
(2) uniminus S
(3) * T (2)
(4) + (1) (3)
(5) assign A (4)
Fig.10.8 Triples
Indirect triple representation used pointers instead of listing triples directly. The indirect triple
representation of the (10.1) is shown in Fig. 1.3.
Fig 10.9(a) Pointers Fig 10.9 (b) Indirect triples
10.2.5 Code Generation
It is a final phase in the complier design. It takes intermediate code as a source program and
provided target program as an output. The real aim of this phase is to preserve the semantic
meaning of input program and effective use of provided resources. The position of the code
generator in the complier design is shown in the Fig 10.9.
Position statement
(0) (14)
(1) (15)
(2) (16)
(3) (17)
(4) (18)
(5) (19)
Position op A1 A2
(14) uniminus s
(15) * t (14)
(16) uniminus s
(17) * t (16)
(18) + (1) (17)
(19) assign a (18)
123
Fig 10.10 Code generator position
Design Issues in Code Generator:
Code Generator Input: The code generation phase assumes that input should be free of
errors. The input is provided by the front end with the symbol table information. The input
can be syntax tree, DAG, Triple, quadruples, postfix, byte code etc.
Output Programs: The target programs are output of the code generators. The output
program can be in different forms like assembly language, relocatable machine language or
absolute machine language. Absolute machine language codes are reside in fixed memory
locations and can be immediately executed. In relocatable codes there is a separate execution
of sub programs. Assembly language as output program makes the process of code
generation easier due to symbolic instructions.
Memory management: Mapping of the names presents in the source programs to the data
object address in the run time is done by the co-operation of code generation and the front
end. The type field in the declaration determines the amount of storage required for declared
variable.
Instruction Selection: The nature of the instruction also plays very important role. If the
target machine is not compatible with the data type then there should be a special handling
for general rule. The selection of Instruction depends upon the level of instruction (low or
high), nature of instruction (data type support).
Register allocation and assignment: To select variables that will be stored in registers and
picking the particular register to store the variable is also one of the design issues.
124
Choice of evaluation order: There exits many evaluation orders. Some evaluation orders
require less registers than others. The final order should be perfect one to improve the
efficiency of the overall process.
Different approaches to code generation: There are different algorithms available for the
code generation. To choose the right algorithm by considering all the pros and cons is also a
big challenge.
The Run time storage management is done by using stack data structures.
Basic blocks and flow graphs
Basic block: It is a sequence of consecutive statements in which once control enters at the
beginning leaves at the end without halting. There are algorithms available to convert three
address statements to basic blocks.
Transformations on the basic blocks:
Structure preserving transformation: These transformations include elimination of
common sub expressions, dead code elimination, i.e., elimination of code which is never
used, renaming temporary variables, interchange of statements etc.
Algebraic transformations: Many algebraic transformations are used to expressions
computed by the basic block to the equivalent algebraic set. For example the statements like
Y: =Y+0 and Y: = Y*1 can be easily eliminated from the basic blocks.
Flow Graphs
Flow control information can be added to the basic block by constructing the direct graph called
flow graph. Basic blocks are the nodes of the low graph. There is a initial graph whose leader is
first statement. The direct edge from b1 to b2 indicates that b2 is immediately followed by b1.
A Code Generation Algorithm:
The input to the algorithm is three address statements constituting a basic block. For the three
address statement of the a:=b op c the following steps are carried out.
Step-1: A function getreg is called to find a location L where the result of three address
statement is stored. The location L can be register as well as memory location.
125
Step-2: The address descriptor is consulted to find out the value of “b”. “b” can be both in
memory and register. Copy the location of “b” to L.
Step-3: In this step instruction op c’ is generated. Where c’ is a current location of c. For storage
register is prefer than memory location. Then further the location is “a” is again considered.
Step-4: The operation is performed and values are stored in register. The registers which were
hold variables can be free now to hold the next instruction variables.
10.2.6 Code Optimization
Code optimization provides the efficient code which executes faster, use efficient memory
techniques and provides better performance. Usually code optimizer is located between the front
end and the code generator. Code optimizer usually does the following:
• Works with intermediate code.
• Provides control flow analysis.
• Provides data flow analysis.
• For the improvement of intermediate code do transformations.
There are various techniques used for code optimizations. They are as follows:-
Peephole Optimizations: These are performed after the generation of the machine code. The
code is scanned to find out the adjacent instructions which can be replaced by the single
instruction or fewer instructions.
Loop optimizations: These are applied on the loop statements such as for loop. These are
very important because usually programs spend their huge percentage of time inside the
loops.
Branch optimization: It rearranges the code of program to minimize the logic of branching
and to merge physically separate blocks of code.
Code motion: If variables used inside a loop are not changing within the loop, then
computation can be done outside of the loop and the various related results can be used
within the loop.
Common sub expression elimination: In common expressions, the similar value is
recalculated in some other subsequent expression. The duplicate expression can be removed
by using the previous results.
126
Constant propagation: Constants used in various expressions are merged and new ones are
generated. Some implicit conversions between integer’s values and floating-point types are
done.
Dead code elimination: It Eliminates code that never be reached during execution or where
the results are not used.
Dead store elimination: It Eliminates stores when the value stored is never referenced again.
For example, if two stores to the same location have no intervening load, the first store is
unnecessary and is removed.
Global register allocation: It provides variables and expressions to hardware registers with
the help of “graph coloring" algorithm.
Inlining: It swaps the function calls with program code
Instruction scheduling: It reorders instructions to reduce execution time
Interprocedural analysis: It Uncovers relation between function calls, and removes loads,
computations and stores that cannot be removed with more straight optimizations.
Invariant IF code floating (Unswitching): It eliminates the invariant branching code from
loops to make opportunity for different optimizations.
Reassociation: It rearranges the sequence of calculations in an array s expression, providing
more candidates for common expression removing.
Store motion: It moves the store instructions outside loops.
Strength Reduction: It swaps less efficient instructions with effective ones.
Value numbering: It involves constant propagation, folding of several instructions into a
single instruction and expression elimination.
10.3 Error Recovery
A parser should be able to identify and report any error in the program. When an error is
encountered, the parser should handle it and carry on parsing the rest of the input. Mostly, the
parser has to check the errors but errors may be encountered at different stages of the
compilation process. A program may have the following kinds of errors at different stages:
Lexical: name of any identifier typed incorrectly
Syntactical: missing semicolon or unbalanced parenthesis
Semantically: mismatched value assignment
127
Logical: code not accessible, infinite loop
10.4 Symbol Table
It is an information structure kept up amid every one of the periods of a compiler. All the
identifier names alongside their sorts are put away in symbol table. The symbol table makes it
less demanding for the compiler to rapidly seek the identifier record and recover it. The symbol
table is additionally utilized for degree administration. Symbol table is a critical information
structure made and kept up by compilers with a specific end goal to store data about the event of
different elements, for example, variable names, capacity names, items, classes, interfaces, and
so forth. Symbol table is utilized by both the examination and the blend parts of a compiler.
A symbol table may fill the accompanying needs relying on the language close by:
To store the names of all substances in an organized structure at one spot.
To check if a variable has been pronounced.
To execute sort verifying so as to check, assignments and expressions in the source code are
semantically right.
To focus the extent of a name (scope determination).
A symbol table is just a table which can be either straight or a hash table. It keeps up a passage
for every name in the accompanying arrangement:
<Symbol name, sort, attributes>
For instance, if a symbol table needs to store data about the accompanying variable
announcement:
static int interest;
At that point it must store the section as:
<interest, int, static>
The characteristic statement contains the passages identified with the name.
In the event that a compiler is to handle a little measure of information, then the symbol table can
be actualized as an unordered rundown that is anything but difficult to code however it is just
fitting for little tables just. A symbol table can be actualized in one of the accompanying ways:
Linear (sorted or unsorted) list
Binary Search Tree
128
Hash table
Alongside all, symbol tables are normally executed as hash tables where the source code symbol
itself is regard as a key for the hash capacity and the arrival worth is the data with respect.
10.5 Summary
The description of the front end and back end of complier has been explained in this chapter. The
various phases like lexical, syntax, semantic, intermediate code generation and code generation
along with their suitable examples also has been discussed. The inter relationship between
different phases has been described. The various code optimization techniques also have been
discussed to provide efficient coding.
10.6 Glossary
Token: A token is a sequence of characters treated as a unit in the grammar of programming
languages.
Lexeme: A lexeme is a sequence of character in program that is matched by pattern for a token.
Pattern: A pattern is a rule describing the set of lexeme that represent particular token.
Symbol table: A symbol table is a data structure used by a language translator such as
a compiler or interpreter.
10.7 Check your progress/self-assessment questions.
Q1 What are the conceivable lapse recuperation activities in lexical analysis?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
_______________
Q2 What are the two sections of a compilation? Clarify briefly?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
129
______________________________________________________________________________
_____________________
Q3 Differentiate tokens, patterns, lexeme?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
____________________________
Q4 What does a semantic analysis do?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
___________________________
Q5 Define uncertain grammar ?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
____________________________
Q6 What are the issues with top down parsing?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
_____________________________
130
Q7 What are the advantages of halfway code era?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
____________________________
Q8 What are the various types of intermediate code representation?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
___________________________________
Q9 Define symbol table ?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
____________________________
Q10 What are the properties of optimizing compiler ?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
____________________________
131
10.8 Answers to Check your progress/self-assessment questions.
1. a) Deleting a superfluous character
b) Inserting a missing character
c) Replacing an off base character by a right character
d) Transposing two contiguous characters.
2. Analysis and Synthesis are the two sections of compilation. The analysis part separates the
source program into constituent pieces and makes a middle of the road representation of the
source program. The union part develops the coveted target program from the middle of the road
representation.
3. Tokens: Sequence of characters that have an aggregate significance.
Patterns: There is an arrangement of strings in the data for which the same token is created as
yield. This arrangement of strings is depicted by a standard called an example connected with the
token
Lexeme: A grouping of characters in the source program that is coordinated by the example for a
token.
4. Semantic analysis is one in which certain checks are performed to ensure that components of a
program fit together meaningfully. Mainly it performs type checking.
5. A grammar G is said to be uncertain in the event that it produces more than one parse tree for
some sentence of dialect L(G).i.e. both furthest left and furthest right inferences are same for the
given sentence.
6. The accompanying are the issues connected with top down parsing:
• Backtracking
• Left recursion
• Left figuring
• Ambiguity
132
7. A Compiler for diverse machines can be made by joining distinctive back end to the current
front finishes of every machine. A Compiler for distinctive source dialects can be made by
demonstrating diverse front closures for relating source dialects t existing back end. A machine
autonomous code streamlining agent can be connected to middle code so as to upgrade the code.
8. There are mainly three types of intermediate code representations.
Syntax tree
Postfix
Three address code
9. Symbol table is an information structure utilized by the compiler to stay informed regarding
semantics of the variables. It stores data about degree and tying data about names.
10. The source code ought to be such that it ought to create least measure of target code. There
ought not be any inaccessible code. Dead code ought to be totally expelled from source
language. The streamlining compilers ought to apply taking after code enhancing changes on
source dialect.
1. common sub expression elimination
2. dead code elimination
3. code movement
4. strength reduction
10.9 Modal Questions
Q1 Explain various phases of compiler?
Q2 What is the use of regular expression D.F.A in lexical analysis?
Q3 How lexical analyzer removes white spaces from a source file?
Q4 How do real compilers deal with symbol tables?
Q5 How would you check that no identifier is declared more than once?
Q6 Specify the lexical form of numeric constant in the language in C?
10.10 References/Suggested Readings
133
4. System programming, John J Donovan, Tata McGraw-Hill Education, 1st Edition.
5. Systems Programming and Operating System, D M Dhamdhere, Tata McGraw-Hill
Education, 1st Edition.
6. System programming, A.A. Puntambekar, Technical publications, 3rd Edition.
134
Lesson-11 YACC
Structure of the lesson
11.0 Objective
11.1 Introduction to YACC
11.2 Format of a YACC file
11.3 LexYacc Interaction
11.3.1 Header files of code
11.4 Rule Section
11.5 User code section
11.6 YACC Declaration Summary
11.7 Just in time compilers
11.7.1 Functioning Of just in time compilers
11.7.2 Classification of just in time compilers
11.8 Platform independent systems
11.9 Various stages accessible
11.9.1 Hardware stages
11.9.2 Software stages
11.10 Summary
11.11Grossary
11.12Answers to check your progress/self-assessment questions.
11.13 Model questions
11.14 References
11.0 Objective
After studying this lesson, students will able to:
Explain the concept of YACC and format to write a code.
Define just in time compilers.
Discuss the Functioning of just in time compilers and there classification.
Define platform independent systems.
135
Explain the types of platforms.
11.1 Introduction to YACC
A compiler is a PC program that changes source code written in a programming language into
another script (the objective dialect, regularly having a twofold structure known as article code).
The most widely recognized explanation behind changing over a source code is to make an
executable system. Its tag as "compiler" is basically utilized for projects that decipher basis
program from an abnormal state programming language to a lower level language (e.g., low
level computing construct or machine code). In the event that the accumulated project can keep
running on a PC whose CPU or working framework is not quite the same as the one on which the
compiler runs, the compiler is known as a cross-compiler. All the more by and large, compilers
are a particular sort of interpreters.A program that interprets from a low level language to a more
elevated amount one is a decompiler. A program that interprets between abnormal state dialects
is normally called a source-to-source compiler or transpiler. A languagerewriter is normally a
program that interprets the type of expressions without a change of dialect. The term compiler-
compiler is once in a while used to allude to a parser generator, an instrument regularly used to
help make the lexer and parser.
YACC gives a general apparatus to forcing structure on the info to a PC program. The YACC
client readies a determination of the info transform; this incorporates tenets portraying the data
structure, code to be conjured when these principles are perceived, and a low-level routine to do
the fundamental information. YACC then creates a capacity to control the info process. Parser
summons scanner for tokens. Parser break down the syntactic structure as indicated by
grammars. At long last, parser executes the semantic schedules.
136
Figure 11.1 : Working of YACC
11.2 Format of a YACC file
Typically a YACC file consists of three things i.e. definitions’, rules and code which is quote
similar to the format of a lexfile. A typical yacc file looks like:
...definitions...
%%
...rules...
%%
...code...
Definition is the basic step written in the YACC file that comprises of all code written
between % {and %} which is copied to the beginning of the resulting C file.
Rules are the production sets which are formed with the combination of patterns and
actions which are taken into braces if the actions are more than one in number.
137
Code is the main part of the file which is lengthy and elaborated part with main priority
part as yylex which is the typical analyser. A default main is used for calling this analyzer if the
code part is left out sometimes. In some of the other references the structure of the YACC file is
given as:
%{
C declaration
%}
YACC declaration
%%
Grammar rules
%%
Additional C code
In order to understand the working of YACC we take an example:
11.3 LexYaccInteraction
138
YACC is responsible for accepting the stream of tokens which are generated by the Lex which
parses file of characteristic and outputs these tokens used by YACC. YACC performs certain
number of actions on the received tokens.yylex routine is repeatedly be called by YACC
program when a lex program supplies tokenizer. The lex standards will likely work by calling
return each time they have parsed a token. We will now see the way lex returns data in a manner
that yacc can utilize it for parsing.
11.3.1 Header files of code
On the off chance that lex is to return tokens that YACC will transform, they need to concede to
what tokens there are. This is done as one takes after. The yacc record will have token definitions
" %token NUMBER" in the definitions section. Exactly when the yacc record is deciphered with
yacc - d, a header archive y.tab.h is made that has definitions like "#define NUMBER 258". This
record can then consolidated in both the lex and YACC program. The lex record can then call
return NUMBER, and the yacc framework can organize on this token. The section codes that are
portrayed from %TOKEN definitions for the most part begin at around 258, with the target that
solitary characters can basically be returned as their whole number worth:
/* in the lex program */
[0-9]+ {return NUMBER}
[-+*/] {return *yytext}
/* in the yacc program */
sum : TERMS ’+’ TERM
11.4 Rule Section
The principles segment contains the grammar of the language you need to parse. This resembles
name1 : THING something OTHERTHING {action}
| othersomething THING {other action}
name2 : .....
139
This is the general sort of case without setting sentence structures, with a course of action of
exercises joined with each organizing right-hand side. It is a not too bad convention to keep non-
terminals (names that can be amplified further) in lower case and terminals (the image's that are
finally organized) in promoted. The terminal image's get composed with return codes from the
lextokenizer. They are regularly portrays starting from % token definitions in the YACC
program or character qualities.
11.5 User code section
The insignificant principle system is
int main()
{
yyparse();
return 0;
}
Expansions to more yearning projects ought to act naturally apparent. Notwithstanding the
principle program of the code segment will for the most part additionally contain subroutines
which is to be utilized either as a part of the YACC or the lex program.
11.6 YACC Declration Summary
`%start' Specify the punctuation's begin symbol.
`%union' Declare the gathering of information sorts that semantic qualities may have.
`%token' Declare a terminal symbol (token type name) with no priority or associativity indicated.
`%type' Declare the kind of semantic qualities for a nonterminal symbol.
%right' Declare a terminal symbol (token type name) that is correct acquainted
`%left' Declare a terminal symbol (token type name) that is left-acquainted
`%nonassoc' Declare a terminal symbol (token type name) that is nonassociative (utilizing it as a
part of a way that would be acquainted is a sentence structure error).
11.7 Just In Time compilers
140
In computing, just-in-time compilation (JIT), generally called component interpretation, is a
procedure to improve the runtime execution of PC undertakings in perspective of code in byte.
Since byte code is interpreted it executes slower than assembled machine code, unless it is truly
requested to machine code, which could be performed before the execution , making the
undertaking loading moderate or during the execution. In this last case which is the reason for
JIT compilation, the venture is secured in memory as byte code, however the code segment
starting now running is preparatively incorporated to physical machine code keeping in mind the
finished objective to run speedier. JIT compilers identify with a cross breed philosophy, with
interpretation occurring continuously, as with interpreters, however with caching made an
interpretation of code to minimize execution corruption. It in like manner offers distinctive
central points over statically requested code at progression time, for instance, handling as of late
bound information sorts and the ability to approve security guarantees. JIT develops two former
considerations in run-time circumstances: Byte code compilation and component compilation. It
changes over code at runtime preceding executing it locally, for instance byte code into
neighborhood machine code. A couple of current runtime circumstances, for instance,
Microsoft's .NET Framework and most use of Java, rely on upon JIT compilation for fast code
execution. In the Java programming dialect and environment, a just-in-time (JIT) compiler is a
system that turns Java byte code (a project that contains instructions that must be interpreted)
into instructions that can be sent particularly to the processor. After you've created a Java
program, the source dialect declarations are gathered by the Java compiler into byte code instead
of into code that contains instructions that match a particular hardware stage's processor (for
instance, an Intel Pentium chip or an IBM System/390 processor). The byte code is stage
independent code that can be sent to any stage and continue running on that stage. Some time
recently, most ventures written in any language,they must be recompiled and sometimes altered
for each PC stage. One of the best inclinations of Java is that need to form and orchestrate a
framework once. The Java on any stage will interpret the orchestrated byte code into instructions
justifiable by the particular processor. Notwithstanding, the virtual machine handles one byte
code instruction at a time. Using the Java just-in-time compiler (genuinely a second compiler) at
the particular structure stage orchestrates the byte code into the particular structure code (as
though the framework had been requested initially on that stage). At the point when the code has
been reassembled by the JIT compiler, it will generally run more quickly in the PC. The just-in-
141
time compiler goes with the virtual machine and is used then again. It arranges the byte code into
stage specific executable code that is instantly executed. Sun Microsystems prescribes that it's
regularly speedier to pick the JIT compiler decision, especially if the procedure executable is
more than once reused.
Figure 11.2 Structure of JIT compiler
11.7.1 Functioning Of Just In Time Compilers
In a byte code-gathered structure, source code is implied as a middle of the road representation
known as byte code1. Byte code is not the machine code for a particular PC, and may be
adaptable among PC architectures. The byte code may then deciphered by, or continue running
on a virtual machine. The JIT compiler scrutinizes the byte codes in various fragments (or in full,
from time to time) and masterminds them alterably into machine dialect so the framework can
run speedier. Java performs runtime watch that watches out for distinctive territories of the code
and this is the reason the entire code is not ordered at once.This ought to be conceivable per-
record, per-limit or even on any self-decisive code area; the code can be gathered when it talks
reality to be executed (therefore the name "in the nick of time"), and a short time later held and
reused later without waiting be recompiled. On the other hand, an ordinary translated virtual
machine will simply decipher the bytecode, generally with much lower execution. A couple of
142
mediators even decipher source code, without the movement of first arranging to byte code, with
much more disgusting execution. Statically assembled code or nearby code is gathered going
before association. A dynamic assemblage environment is one in which the compiler can be used
amid execution. Case in point, most Common Lisp systems have a total limit which can fuse new
limits made amid the run. This gives an extensive parcel of the advantages of JIT, however the
product architect contradicts to the runtime point as it is in control of what parts of the code are
arranged. This can in like manner total effectively made code, which can, in various
circumstances, give liberal execution central focuses over statically aggregated code [citation
needed], furthermore over most JIT systems. An ordinary target of utilizing JIT techniques is to
achieve or surpass the execution of static arrangement, while keeping up the advantages of byte
code understanding: Much of the "diligent work" of parsing the first source code and performing
basic change is regularly dealt with at consolidate time, going before sending: aggregation from
byte code to machine code is much speedier than assembling from source. The sent byte code is
advantageous, not in the least like neighborhood code. Since the runtime has control over the
gathering, as deciphered byte code, it can continue running in a secured sandbox. Compilers
from byte code to machine code are less requesting to form, in light of the fact that the helpful
byte code compiler has authoritatively done a critical piece of the work.
JIT code all around offers clearly preferable execution over mediators. Additionally, it can from
time to time offer ideal execution over static total, the same number of headways are only
conceivable at run-time:
The collection can be moved up to the engaged CPU and the working structure model where
the application runs. Case in point JIT can pick SSE2 vector CPU bearings when it
recognizes that the CPU supports them. However there is starting now no standard JIT that
executes this. To get this level of change specificity with a static compiler, one must either
gather a twofold for each normal stage/building configuration, or else join diverse
adjustments as parts of the code within a lone matched.
The structure has the ability to accumulate bits of knowledge about how the venture is truly
running in nature it is in, and it can patch up and recompile for perfect execution. In any case,
some static compilers can in like manner take profile information as data.
143
The structure can do overall code improvements (e.g. in covering of library limits) without
losing the upsides of element connecting and without the overheads basic to static compilers
and linkers. Specifically, while doing overall inline substitutions, a static aggregation
technique may need run-time checks and ensure that a virtual call would happen if the
genuine class of the thing overrides the inlined procedure, and point of confinement
condition watches out for bunch which must be taken care of within circles. With at the last
possible second plan a significant part of the time this get ready can be moved out of circles,
consistently giving broad additions of pace.
Although this is possible with statically requested rubbish assembled tongues, a byte code
system can more easily patch up executed code for better.
11.7.2 Classification of just in time compilers
All through looking over JIT work, some typical qualities have been developed.We recommend
that JIT outline works can be gathered by properties:
(1) Invocation:- A JIT compiler is unequivocally conjured if the customer must make some
move to realize accumulation at runtime. An absolutely summoned JIT compiler is clear to the
customer.
(2) Executability:- JIT structures conventionally include two dialects: a source dialect to
translate from, and a target dialect to mean (although14 rather than the progressing change of
Kistler's [2001] "ceaseless improvement," just assemblage happened at the same time utilizing
"persistent compilation,"and which happens just once.These vernaculars can be the same, if the
JIT system is simply performing upgrade on-the-fly). We call a JITsystemmonoexecutable in the
event that it can execute one of these tongues, and polyexecutable if can execute more than one.
Polyexecutable JIT frameworks have the upside of choosing when compiler conjuring is
defended, subsequent to either program representation can be used.
144
(3) Concurrency:- This property portrays how the JIT compiler executes, in appreciation to the
task itself. In the occasion that program execution delays under its own specific volition to
permit assemblage, it is not concurrent that the JIT compiler for this circumstance may be
summoned by method for subroutine call, message transmission, or trade of control to a co-
schedule. Interestingly, a concurrent JIT compiler can fill in as the framework executes all the
while in an alternate string or methodology, even on a substitute processor. JIT structures that
limit in hard persistent may constitute a fourth characterizing property, yet there is in every way
little research in the extent at present.It is ambiguous if hard progressing imperatives speak to
any of a kind issues to JIT systems. A couple examples are self-evident. For example, positively
summoned JIT compilers are certainly dominating in late work. Sensitivity varies from system to
structure, yet this is more an issue of arrangement than an issue of JIT advancement. Wear down
synchronous JIT compilers is in a matter of seconds simply starting, and will most likely
increment in hugeness as processor advancement creates.
11.8 Platform independent systems
Credit given to PC programming or figuring schedules and thoughts that are realized between
work on various PC stages are similarly said to be stage free. stage free programming may be
divided into two sorts; one obliges individual building or course of action for each stage that it
supports, and the other one can be particularly continue running on any stage without exceptional
preparation, e.g., programming written in a deciphered dialect or pre-requested advantageous
bytecode for which the arbiters or run-time packs are typical or standard fragments of all stages.
a cross-stage application may continue running on Microsoft Windows on the x86 building
configuration, Linux on the x86 auxiliary arranging and Mac OS X on either the PowerPC or x86
based Apple Macintosh frameworks. Cross-stage activities may continue running on the same
number of as each and every current stage, or on as few as two stages.
Stage freedom in programming implies that you can run some code with practically zero change
on various stages.
145
It depends on upon what you portray as "the stage". From time to time this may be a specific
hardware/machine plan. In diverse gasses may be a "nonexclusive PC". In distinctive cases it
may be a virtual machine and runtime environment (which is the circumstance with Java).
Nothing is "perfectly" stage autonomous - there are reliably a couple corner cases that can get
you out. For example, in case you hardcode record route separators rather than using the stage
independentFile.pathSeparator as a piece of Java then your code won't wear down both Windows
and Linux. As a product engineer, you need to watch out for these things, constantly use the
stage free option where possible and test properly on differing stages in case you consider
comfort.
There are always a couple of restrictions on specific stages that can't be ignored. Representations
are things like the most compelling length of filenames, or the open RAM on a framework.
Despite the sum you endeavor to be stage autonomous, your code may miss the mark if you
endeavor to run it on a stage that is too solidly obliged. It's basic to take note of that some
dialect's are stage autonomous at the source code level (C/C++ is a not too bad case) yet lose
stage flexibility once the code is amassed (since nearby code is stage specific). Java holds stage
opportunity even after code is accumulated in light of the way that it collects to stage free
bytecode (the genuine change to neighborhood code is dealt with at a later time after the
bytecode is stacked by the JVM).There are unexpectedly bugs in dialect executions that simply
happen on particular stages. So paying little heed to the way that your code is speculatively
100% helpful, regardless of all that you need to test it on differing stages to confirm you aren't
running into any shocking.
In java the stage indepence can be taken as:
Java code is stage autonomous as in the same Java application or counts (frequently gathered
to Java bytecode and packaged in a .container record) will run vaguely on Windows and
Linux.
Java libraries (e.g. all the lovely open source toolsets) are by and large stage free, the length
of they are formed in impeccable Java. Most libraries endeavor to stay with unadulterated
146
Java in order to keep up stage self-governance, however there are a couple of circumstances
where this is impossible (e.g. if the library needs to interface direct with one of a kind gear,
or call a C/C++ library that uses neighborhood code).
The Java stage/runtime environment is stage autonomous as in the same libraries (pictures,
frameworks organization, File IO et cetera.) are open and work in the same way on all stages.
This is done deliberately to allow applications that uses these libraries to have the ability to
continue running on any stage. Case in point, the Java libraries that get to the record
framework know the way that Windows and Linux use unmistakable filename way
separators, and make note of this for you. Clearly, this infers in the motor the runtime
environment makes use of stage specific parts, so you require a substitute JRE for each stage.
The JVM itself (i.e. the Java Virtual Machine that is responsible for JIT ordering and running
Java bytecode) is stage autonomous as in it is available on various stages (everything from
brought together PCs to cell phones). However specific types of the JVM are required for
each fundamental stage to make note of unmistakable nearby heading codes and machine
capacities (so you can't run a Linux JVM on Windows and the a different way). The JVM is
packaged as a noteworthy part of the Java stage/runtime environment as above.
By and substantial, Java is in all probability about as close certified stage self-sufficiency as
you can get, yet as ought to be clear there is still a lot of stage specific work done in the
motor.
On the remote possibility that you stick to 100% faultless Java code and libraries that can be
subject to Java as being "effectively" stage autonomous and it generally fulfills the compose
Once Run Anyw
11.9 Various stages accessible
The term stage can suggest the sort of processor and/or other gear on which a given working
framework or application runs, the kind of working framework on a PC or the blend of the kind
147
of hardware and the kind of working framework running on it. An instance of a run of the mill
stage is Microsoft Windows running on the x86 auxiliary designing. Other without a doubt
comprehended desktop PC stages join Linux/Unix and Mac OS X - both of which are themselves
cross-stage. There are various devices, for instance, mobile phones that are in like manner
satisfactorily PC stages however less customarily considered in that way. Application
programming can be made to depend on the components of a particular stage either the
hardware, working framework, or virtual machine it continues running on. The Java stage is a
virtual machine stage which continues running on various working frameworks and hardware
sorts, and is a run of the mill stage for programming to be created for.
11.9.1 Hardware stages
An equipment stage can insinuate a PC's basic building or processor development displaying.
For example: x86 basic designing and its varieties, for instance, IA-32 and x86-64. These
machines often run one adjustment of Microsoft Windows, then again they can run other
working frameworks as well, including Linux, OpenBSD, NetBSD, Mac OS X and FreeBSD. An
ARM basic designing is fundamental on mobile phones and tablet PCs, which run Android, iOS
and other flexible working frameworks.
11.9.2 Software stages
Programming stages can either be a working framework or programming environment, however
more routinely it is a mix of both. An extraordinary unique case to this is Java, which uses a
working framework free virtual machine for its consolidated code, alluded to in the domain of
Java as bytecode. Outlines of programming stages include: lixus,java,mcrosoftwindows,DOS
sort framework.
11.10 Summary
Yacc gives a general apparatus to forcing structure on the info to a PC program. The Yacc client
readies a determination of the info transform which incorporates tenets portraying the data
structure, code to be conjured when these principles are perceived, and a low-level routine to do
148
the fundamental information. Yacc then creates a capacity to control the info process. Parser
summons scanner for tokens. Parser break down the syntactic structure as indicated by
grammars. At long last, parser executes the semantic schedules.In computing, just-in-time
compilation (JIT), otherwise called element interpretation, is a strategy to enhance the runtime
execution of PC projects in perspective of byte code (virtual machine code). Since byte code is
deciphered it executes slower than accumulated machine code, unless it is truly requested to
machine code, which could be performed before the execution, making the undertaking stacking
direct or amid the execution. In this last case ,which is the reason for JIT accumulation the task is
secured in memory as byte code, however the code bit starting now running is preparatively
fused to physical machine code remembering the deciding objective to run speedier.Platform
autonomy in programming implies that you can run some code with almost no adjustment on
various stages. It depends on what you describe as "the platform".Nowand again this may be a
specific gear/machine course of action. In diverse gasses it might be a "nonexclusive PC". In
distinctive cases it may be a virtual machine and runtime environment (which is the circumstance
with Java).
11.11 Grossary
YACC : YACC gives a general apparatus to forcing structure on the info to a PC program. The
Yacc client readies a determination of the info transform which incorporates tenets portraying
the data structure, code to be conjured when these principles are perceived, and a low-level
routine to do the fundamental information. Yacc then creates a capacity to control the info
process.
JIT :In computing, just-in-time compilation (JIT), otherwise called element interpretation, is a
strategy to enhance the runtime execution of PC projects in view of byte code (virtual machine
code).
Platform independentsystems: Stage autonomy in programming implies that you can run some
code with practically no change on numerous stages.
Hardware platforms: An equipment stage can imply a PC's basic designing or processor
development displaying.
Software platforms: Software platforms can either be a working system or programming
environment, however all the more regularly it is a blend of both.
149
Check Your Progress/Self-Assessment Questions
Q1) Explain lex and yacc tools?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
___________________________________
Q2) Give the structure of the lex program?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
___________________________________
Q3) What is an internal command? Give an example?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
___________________________________
150
Q4) what is exit status command?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
___________________________________
.
Q5) Give the structure of the Yacc program?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
_____________________________
11.12 Answer to Check Your Progress/Self-Assessment Questions
Ans:1. Lex:- scanner that can identify those tokens
Yacc:-parser.yacc takes a concise description of a grammar and produces a C routine that can
parse that grammar.
Ans 2: Definition section- any initial ‘c’ program code
% %
Rules section- pattern and action separated by white space
%%
User subroutines section-concsit of any legal code
151
Ans 3:Command which is shell built-in eg:echo
Ans 4:Exit 0- return success,command executed successfully.
Exit 1 – return failure
Ans 5: A typical YACC file looks like:
...definitions...
%%
...rules...
%%
...code...
11.13 Model questions
Q1) Explain the procedure for executing a lex program and yacc program?
Q2) What is the difference between Lex and Yacc?
Q3) What is the treatment of lines written between first %{ and }% ?
Q4) For what purpose is production rule section used for ?
Q5) Define how to “understand” the input lang and what action to take for each “sentence”?
11.14 References/Suggested Readings
7. System programming, John J Donovan, Tata McGraw-Hill Education, 1st Edition.
8. Systems Programming and Operating System, D M Dhamdhere, Tata McGraw-Hill
Education, 1st Edition.
9. System programming, A.A. Puntambekar, Technical publications, 3rd Edition.
152
Lesson 12 Fundamentals of OS
12.0 Objectives
12.1 Introduction
12.2 Operating System: Operating Systems and its functions
12.3 Types of operating systems
12.4 Real-time OS
12.5 Distributed OS
12.6 Mobile OS
12.7 Network OS
12.8 Summary
12.9 Glossary
12.10 Answers to Check Your Progress / Self Assessment Questions
12.11 References / Suggested Readings
12.12 Model Questions
12.0 Objectives
After reading this lesson you will be able to know meaning and type of Operating
Systems
To know about various functions performed by Operating Systems
To know the meaning and importance of RTOS (Real Time Operating Systems)
To know the meaning and importance Mobile OS, Network OS.
12.1 Introduction
Operating system is the most fundamental and most important part of a computer system. It is a
type of system software. Basically it acts as an interface between user and computer and enables
end user to control his/her computer. There are various types of Operating Systems like single
user, multi user, multi tasking, multiprogramming etc. Today, apart from computer operating
system, mobile operating systems i.e. operating system for smart phones are also available.
153
12.2 Operating Systems and its functions
12.2.1 Meaning of Operating System
A digital computer system consists of two major components – hardware and software. These
two are supplementary to each other. In software part, Operating System (OS) is one major and
core software component of any digital computer system. It performs various important tasks and
in its very basic form it also acts an interface between the end user and computer system. This is
just one job of OS, apart from this, it also acts an interface between various components of
computer system, small modules of OS which provide an interface to use various devices
attached to computer system are called “device drivers". This lets the user manage and control
various devices attached. Operating System has been defined as “an operating system is a
program that manages the computer hardware”. Thus, in one way it lets user interact with the
computer system and in other way it lets devices work in a proper manner inside a computer
system. This is depicted in following figure also.
Fig 12.1 Role of Operating System
Some important objectives of OS are :
1. Convenience: It lets the user use computer system in a convenient and user friendly way.
2. Efficiency: It lets computer system use various hardware and software resources available
efficiently.
154
3. Ability to evolve: OS is built in such a way that it permits effective development and
introduction of new functions without interfering with other services.
12.2.2 Functions of OS
Various functions performed by Operating System are explained below.
1. Acts as an interface: First and foremost function of operating system is to enable the user of
a computer system to use it conveniently and efficiently. For this, operating system provides
operating system commands and system calls. To be able to use a computer user has to learn
these operating system commands. These commands differ depending upon the type of user
interface provided by OS. User interface may be CUI (Command User Interface) or GUI
(Graphical User Interface). That part of OS which interprets user commands is called shell.
2. Process Management: Every software or its part running inside a computer system is called
a process. At any time many processes may be running. Sometimes different processes may
communicate with each other to share some data. This is called IPC (Inter Process
Communication). Further, sometimes a process may be dependent on the outcome of some
another process. For this process synchronization may be required, this is called IPS (Inter
Process Synchronization). OS provides various software tools to implement and ensure IPC
and IPS.
3. CPU Scheduling: CPU is most important and expensive, but limited resource inside a
computer system. Each process needs CPU to execute its instructions. Processes compete for
using CPU. OS decides which process will use CPU and for how long. This is also known as
processor management. There are various CPU scheduling algorithms inbuilt into the OS.
4. Memory Management: Every software or process that runs inside a computer, is loaded into
internal memory (RAM – Random Access Memory). RAM is a costly hardware and is
available in limited capacity. It must be used very efficiently. Memory may have to be shared
among processes. Deciding which process will be allocated which part and how much
memory is decided by OS.
5. Disk or Storage Space management: Storage space refers to external memory. Files of user
are saved on the storage devices because it is non volatile memory. For this space is allocated
to store user files. Storage space allocation, de-allocation, defragmentation, division of disk
space into smaller drives etc. is also performed by Operating System.
155
6. Device management: OS manages and controls various Input/Output and Peripheral
devices connected to computer system. This part of OS is called device driver. For
example, it is only the device driver that makes mouse and printer usable. Part of OS that
controls hardware is called kernel.
Check Your Progress / Self Assessment Questions
Que. 1. Which are two types of computer memory?
Que. 2. What is difference between shell and kernel?
12.3 Types of operating systems
There are various types of OS depending upon capacities, capabilities and usability of OS. Some
popular types are explained below.
Single User and Multi User: Traditional and earlier OS were very small and simple having
the capability to manage only a single user at a time. Whereas modern and contemporary OS
are very intelligent and complex and can manage more than one user simultaneously. An
example of single user OS is DOS (Disk Operating System) and example of multi user OS is
Unix.
Single tasking and multi tasking: Some OS allows only one task at a time. User can initiate
another task only after the completion and termination of first one. Memory management is
very easy as there can be only two things in memory, OS and a user process. Whereas some
other OS being larger, more complex and intelligent can manage more than one task
simultaneously, more than one process can be loaded in memory, and therefore memory
needs to be partitioned in multiple parts. Process scheduling has also to be implemented by
OS to decide which process will use CPU. This increases CPU utilization. Most of today’s
OS are multitasking.
Single Threading and multi threading: A thread is a single flow of execution inside a
process. Some multitasking OS can be single threaded whereas others may be multithreaded.
156
Multithreading is multitasking within a process. It increases CPU throughput and also
decreases average turnaround time of a process.
Check Your Progress / Self Assessment Questions
Ques 3. What is multithreading?
157
12.4 Real-time OS
An OS can be a general purpose operating system which is used for a computer that runs general
purpose user applications like word processing, spreadsheets, DBMS etc. But, there are some
other operating systems designed to run some special purpose applications. For example, some
processes running inside a computer are very time critical. They have strict time constraints.
They must be completed exactly within allotted time span or time deadline. Otherwise any type
of loss may occur. Real time operating systems (RTOS) are for real time applications. Real time
applications are broadly of two types, soft and hard. In case of soft real time applications little bit
flexibility in time limit violation is allowed. But in hard real time application there is flexibility
to violate timing constraints. An example of soft real time application is searching the account
balance of a bank customer who wants to execute a transaction at bank ATM. He can wait for the
ATM machine to respond within few minutes only. This is example of soft-real time application.
Some other examples of soft real time applications are Videoconference applications, E-
commerce transactions, online gaming, online chatting and IM (Instant Messaging) etc.
Hard real time applications have harder timing constraints with no flexibility or no choice of
increasing time limits. Such an application is considered a failure if it does not complete within
predefined allotted time. Few examples of such hard real-time systems include, robotic systems,
components of pacemakers, anti-lock braking system of an automobile and auto-pilot aircraft
control system etc.
CPU scheduler of such a Real Time Operating System (RTOS) is also designed to provide a
predictable or deterministic execution time and pattern. This type of operating system is best
suited for embedded systems as they systems often have real time constraints.
An example of RTOS is Free RTOS. It is a class of real time operating system which is designed
to be small sized so that it can fit on a microcontroller. A microcontroller is a small and resource
constrained processor on a single chip that include the processor and ROM to store the program
to be executed, and RAM.
Some features of Real Time Operating Systems include :
1. They must meet their timing constraints.
2. They are multitasking operating systems.
158
3. They are usually small sized so that they can fit a microcontroller.
4. They need less RAM.
5. RTOS use non-pre-emptive scheduling algorithms.
6. They are fault tolerant.
Check Your Progress / Self Assessment Questions
Ques 4. What is RTOS?
Que. 5. Which are two types of RTOS?
Que. 6. Write any three features of RTOS.
12.5 Distributed OS
Distributed application, distributed system and distributed operating system are different. Let us
understand these one by one. A distributed application is software which is divided into parts,
and parts are executed on different computers connected through a network. These parts interact
with each other in order to achieve a specific and common goal. Being run on multiple
computers in parallel they can complete sooner. A distributed application can also be run in
client server environment. Usually, the front end of the application requires less processing
power and run on the client machine, whereas back end being larger and complex requires more
processing power and is therefore run n a powerful server computer.
A distributed system is a collection of independent computers that work in a transparent manner
and appears to its users as a single system. Definition of distributed system: “A distributed
system consists of a group of autonomous computers linked to a computer network and equipped
with distributed system software”
Distributed Operating System (DOS) is a design of operating system which manages distributed
applications that are running on multiple computers on a network. Distributed Operating Systems
are extension of Network Operating Systems. Such an operating system has specific goals like,
information sharing, scalability i.e. possibility to add components, improved reliability, fault
tolerance etc. Distributed systems are broadly of two types, loosely coupled and tightly coupled.
Loosely coupled systems do not share memory or resources. They communicate through
159
message passing. Tightly coupled systems share memory and resources. Parallel computing and
cloud computing are special cases of distributes systems.
Advantages of Distributed Systems
Price Performance Ratio: A group of separate networked microcomputers provide more
efficiency for less cost than what a mainframe computer does.
Higher performance: N processors potentially gives n times the computational power of a
single machine.
Resource sharing: Limited and expensive resources can be shared among multiple
computers without the need to replicate resources for each system.
Scalability: Modular structure of distributed systems makes it easier to add or more systems.
Fault tolerant: Replication of processors and resources makes the system fault tolerant.
Check Your Progress / Self Assessment Questions
Que. 7. Which are two types of Distributed Operating System?
Que. 8. Write any two advantages of Distributed Operating System.
12.6 Mobile OS
Mobile phones of today are called smart phones and have a microprocessor inside. They are very
close to a digital computer. Just like a digital computer, a mobile phone also has ROM, RAM,
and external memory. It also performs booting when it is started and can run multiple
applications. So just like a PC a mobile phone also needs an Operating System.
Such an OS allows not only smart phones but also tablet PCs, PDAs etc. to run applications. It is
specifically designed to run on top of other mobile applications that can run on mobile devices.
Operating systems for smart phones differ from operating system of a general purpose computer
in many ways. This is basically due to difference in the features of both. A smart phone has
many unique features generally not found in a PC, like a touch screen, screen auto rotation,
finger print recognition, Wi-Fi, GPS mobile navigation, video camera, speech recognition
system etc. Also, mobile phones have comparatively less amount of memory available than a PC.
A mobile phone is like an embedded device that carries its own operating system. The mobile
160
OS determines which third-party applications called mobile applications can be run on your
smart phone.
There are different types of Operating Systems for mobile phones.
1. Operating systems that are manufacturer-built proprietary OS. This includes
Apple’s iOS for iPhone, iPad etc.
RIM BlackBerry OS for all BlackBerry phones
HP’ s own Palm Web OS for their Palm series of mobiles
2. Third party proprietary operating systems i.e the operating systems developed by some
company other than the manufacturer of phone. Such operating systems are:
Microsoft Windows Phone 7
Microsoft Windows Mobile
3. Operating systems which are Free and Open Source OS. Example in this category
include, Google’s Android OS used on many type of smart phones like Samsung smart
phones and Symbian OS used on Nokia mobiles.
Some Popular Mobile Operating Systems
1. Android OS by Google Inc.
2. Bada OS by Samsung Electronics
3. BlackBerry OS
4. iPhone by Apple
5. MeeGo OS by Nokia and Intel
7. Symbian OS by Nokia
8. webOS Palm/HP
9. Windows Mobile and Windows Phone 7 and 10 by Microsoft
12.7 Network OS
Network Operating System (NOS) is an operating system that has some special feature for
managing the host computer i.e. the computer on which it is installed and other computers on the
network (LAN). It makes easier to control various computers on the network. It generally refers
161
to OS that enhances a basic operating system by adding networking features. Basically, it acts as
a director that keeps the network running in-order.
Advantages of using NOS are resource sharing like printer sharing, database/file sharing,
application sharing, and security, and other housekeeping aspects of a network.
A major difference between network operating system and distributed operating system is that in
NOS, network being LAN (small area), users are aware of multiplicity of machines, whereas in
case of DOS, which may be spread over a large area, users not aware of multiplicity of machines.
Also, DOS is an extension of NOS and it supports even higher levels of abstraction, co-operation
and integration of the machines available on the network.
The environment provided by and managed by NOS consists of group of loosely connected i.e
such machines that does not share memory, but are connected by external interfaces that run
under the control of NOS. Just like an ordinary OS, NOS also provides services to its users and
application software that run on the top of OS layer, but the type of services and the manner in
which they are provided are quite different from that of an ordinary OS.
Broadly, there are two variants of NOS, Client Server and Peer to Peer. In Client Server model, a
host computer acts like a server and provide services demanded by client computers. It acts like a
master controller. In case of peer to peer model any computer can provide any service to any
other. For example, one computer may act as print server, another one may be file server etc.
Operating systems like UNIX, Mac OS, Novell Netware, Microsoft Windows Server, and
Windows NT are examples of a NOS.
Check Your Progress / Self Assessment Questions
Que. 9.True/False: DOS is an extension of NOS?
Que 10. Which are two types of NOS?
162
12.8 Summary
Operating system is the most fundamental and most important part of a computer system. It is a
type of system software. A digital computer system consists of two major components –
hardware and software. These two are supplementary to each other. In software part, Operating
System (OS) is one major and core software component of any digital computer system. Major
jobs of an OS include: it acts an interface between user and computer, Process Management,
CPU Scheduling, Memory Management, Disk or Storage Space management. Device
management. Basic types of OS include, Single User and Multi User, Single tasking,
multitasking, single threading and multithreading.
RTOS is Real Time Operating System, its CPU scheduler is designed to provide a predictable or
deterministic execution time and pattern. An example of RTOS is FreeRTOS. Distributed
application, distributed system and distributed operating system are different. Distributed
Operating System (DOS) is a design of operating system which manages distributed applications
that are running on multiple computers on a network.
Mobile phones and smart phones also have an Operating System, these are called mobile OS.
Such an OS allows not only smart phones but also tablet PCs, PDAs etc. to run applications. It is
specifically designed to run on top of other mobile applications that can run on mobile devices.
Network Operating System (NOS) is an operating system that has some special feature for
managing the host computer i.e. the computer on which it is installed and other computers on the
network (LAN).
12.9 Glossary
OS : Operating System
DOS : Disk Operating System, Distributed Operating System
NOS : Network Operating System
PDA : Personal Digital Assistant
Android: Mobile Operating System by Google.
Windows: PC Operating System by Micro soft.
CUI : Command User Interface
163
GUI: Graphical User Interface.
RTOS: Real Time Operating System
LAN : Local Area Network
FreeRTOS: A Real Time Operating System
12.10 Answers to Check Your Progress / Self Assessment Questions
Ans 1. Broadly, there is two types of computer memory. Internal memory (RAM and ROM) and
External memory (Hard Disk, CD, Pen Drive).
Ans 2. Shell is that part of OS that interprets user commands. Kernel is that part of OS that
controls and manages hardware.
Ans 3. Multithreading is multitasking within a process.
Ans. 4. RTOS is Real Time Operating Systems
Ans 5. RTOS are of two types, Soft RTOS and Hard RTOS.
Ans 6. Features of RTOS include : they are multitasking, they are small sized, they use non-
preemptive CPU scheduling.
Ans. 7. Loosely coupled and tightly coupled
Ans. 8. Price performance ratio. Resouce Sharing.
Ans 9. True.
Ans 10. Client Server model and peer to peer model
12.11 References / Suggested Readings
Operating System Concepts by Abraham Silberschartz, Peter Baer Galvin
Fundamentals of Operating System by Anshuman Sharma, Anurag Gupta
12.12 Model Questions
164
Which are various functions of an OS?
Which are various types of OS?
What is difference between a Network Operating System and Distributed Operating
System?
What is Mobile Operating System?
Which are various Mobile Operating Systems?
165
Lesson 13. Booting Techniques and Device Drivers
13.0 Objectives
13.1 Introduction
13.2 Booting techniques and subroutines
13.3 Introduction to Device Drivers
13.4 USB and Plug and Play systems
13.5 Summary
13.6 Glossary
13.7Answers to Check Your Progress / Self Assessment Questions
13.8 References / Suggested Readings
13.9 Model Questions
13.0 Objectives
To know about the meaning and techniques of booting
To know about subroutines
To explore the details of device drivers
To explore the concept of USB (Universal Serial Bus)
To know about PnP (Plug and Play) devices
13.1 Introduction
When any electronic device like a PC or mobile phone is switched on, it loads its Operating
System, this is called booting. There are different types of booting. A subroutine is a sub
program that assists main program and called by main program to perform specific task. A
device driver is system software needed by any device that is connected to a PC for example a
mouse or a keyboard etc. A bus is electronic circuit or a collection of wires through which data
travels from one place to another. A bus can be internal or external. USB is a similar type of bus.
166
Modern devices when connected to a computer do not ask for a restart. They start functioning
immediately; they are called Plug and Play (PnP) devices.
13.2 Booting Techniques
Every computer needs an operating system. Operating system is system software that provides an
interface between a computer and its user. It is compulsory part of computer software. Initially
operating system of a computer is saved into external memory which may be a hard disk. When a
computer starts up, operating system is loaded from external memory into internal memory so
that it can start its working. This process of loading operating system of an electronic device into
its memory is called booting; this is also termed as booting up a computer. On larger computers
like mainframes and minicomputers and alternative term “Initial Program Load" is used for
booting.
Figure 13.1 Booting Techniques and Types
13.2.1 Types of booting
Depending upon how and when booting takes place, it is of different types. There can be cold
booting or hard booting, warm booting or soft booting, local booting and remote booting.
167
Cold Booting: When compute is switched on initially, booting takes place, this is called hard
booting or cold booting. During hard booting POST (Power On Self Test) is also performed, so it
make take comparatively more time. When booting process completes, computer lands in the
normal, operative, run time environment and control is given to user for further working. It is
called cold booting because power supply, motherboard and other resources start their work from
cool condition. A cold boot completely cuts the power from the motherboard and let it reset all
its components and clears memory completely. Cold boot means the hard drive stops and then
start spinning again, and various components may cool down.
Warm Booting: Sometimes a computer may need to be restarted because it may hang, or after
installing a new hardware or software it may ask for a reboot. When a computer is restarted or
rebooted, it is called warm booting. Warm booting takes less time because it does not perform
POST. Actually, various components on the motherboard of the computer do not stop its
working during warm boot, so the BIOS (Binary Input Output System) cannot perform a full
start-up sequence with a POST routine during warm boot. Warm booting can be performed in
many ways, for example by hitting restart button at CPU, by pressing Ctrl+Alt+Del keys from
keyboard or also by using some software.
13.2.2 Techniques of Booting
DOS/Local Booting: When bootstrap loader is available on the hard disk of the machine itself
and the operating system that it loads is also available locally on the same machine, it is called
local booting, since the concept was very popularly used in PC with DOS, therefore it is also
called DOS booting. Initially, a small software called boot loader or boot strap loader (BSL)
loads an operating system from external memory into internal memory after completion of the
power-on self-tests. Boot strap loaded is started by another software which is on ROM (Read
only memory). During booting a PC also performs some sort of initial processing called POST
(Power on Self Test). Exact sequence of booting is given below.
User turns on the Power
Internally, processor pins and registers are reset to specific value.
After this, CPU starts execution of instructions at first address of BIOS (0xFFFF0).
168
In turn, BIOS runs POST (Power-On Self Test) and performs some other necessary
checks.(BIOS is the firmware in the ROM).
After this, BIOS jumps to Master Boot Record program which is at 1st sector of disk
MBR runs Primary Boot-loader
Boot loaders loads Operating System into memory.
Flash Booting: There is a need of minimizing the booting time for embedded systems for
optimizing the processing time from the starting stage of the boot loader to mounting the
Operating System. There is an efficient fast booting technique for embedded systems based on
flash memory. In case of embedded systems a small flash boot loader program stays in flash
memory and is always the first and automatic application that runs when device is switched on.
Commonly used flash boot loader programs are based on Control Area Network (CAN) to
provide startup data to Electronic Control Unit of the device.
Remote/Network Booting: Another technique of booting whereby a computer that does not
have its own hard disk and operating system (and is therefore called dumb terminal) is given
booting instruction from a remote intelligent machine called server.
Thus remote booting refers to booting up a client machine from a server. Remote boot capability
is necessary for diskless workstations and network computers, but sometimes it may also prove
helpful for restarting failed desktop machines. An advantage of remotely booting a computer via
network is that it does not need to have its own hard disk and its own OS, so it results in a less
costly machine. Any further upgrades or changes to Operating System are also easier to make.
Further, removal of internal hard drives from a machine makes it diskless, and therefore it will
consume less power and will also generate less heat. It also means that can be packed more
densely, and there is less need for localized cooling.
But there is a problem also, if due to some reason, server that performs remote booting fails, the
remote machine will not boot and will not function, but having redundant or backup of server
can solve this problem. For example, Diskless Remote Boot in Linux (DRBL) is a network file
system server providing a diskless or system less environment for client machines.
169
13.2.3 Booting Subroutines:
A subroutine is a subprogram called by main program that performs some specific task and exits.
Main program may send some values called arguments to the subroutine which are used by
subroutine for processing the task it has been assigned. A subroutine, at end, may send some
processed results back to the main or calling program, but in that case subroutine is more
specifically called a function subprogram. Some boot programs use BIOS I/O routines to read
from the disk.
Subroutine are heavily used in booting by ROM BIOS (Read Only Memory - Binary Input
Output System), these are known as booting subroutines.
Generally, a hard disk is divided into multiple partitions and OS is saved on one of the partitions
called primary partitions. For a multi-boot system, multiple operating systems can be loaded on
different partitions. Some special area generally a single disk sector of usually 512 bytes is
reserved near the beginning of a partition, called PBR or Partition Boot Record, and code saved
on this partition is called boot block. After loading this boot block in internal memory of
computer, firmware or ROM BIOS passes control to that code. That initial boot code completes
loading the operating system in several steps. Fetching data from a disk drive is very complex.
and therefore the boot code generally uses subroutines that are in the ROM for reading the disk.
But there can be some limitations in system booting since full disk reading capacities are not
available till the booting is complete and operating system has been loaded Until that time, all
the limitations present in ROM disk subroutines utilities will affect the booting process.
Check Your Progress / Self Assessment Questions
Que. 1. What are different types of booting?
Que. 2. What is MBR?
Que. 3. Where is Boot block loaded?
13.3 Introduction to Device Drivers
170
A device driver is a type of system software. It is a small but complex software module that
enables communication between a computer and its connected peripheral device. A device driver
is used to tell operating system how does the device work. Basically, it is another software layer
that lies between the application layer and the hardware layer containing actual device. A device
driver is actually part of kernel of an operating system, and therefore it can access kernel
functions. Very basic purpose of a device driver is to instruct CPU about how to communicate
with Input Output device by translating the operating system's input output related commands in
a language that the device can understand. A device driver has two interfaces, on one side it has
to interact with OS kernel, and on the other side it has do interact with the device for which it is
built.
One major function of a device driver is to fetch data from the device buffer and pass on it to the
operating system kernel for further processing. It also does the job of handling and reporting I/O
errors if any. A device driver is a major part of an operating system. Approximately, 70% part of
operating system code is device drivers. A typical version of Linux has approximately 36
different device drivers. Most of the operating system bugs are also due to bugs in device drivers,
approximately, 70% of operating system failures are caused by driver bugs. Device drivers are
usually operating system and version specific, for example Windows XP device drivers may not
be compatible with Windows 7 or later versions. Many device drivers are part of OS kernel, but
if a device is produced after the OS release, it will come with its own device driver software on a
CD that accompanies the device or can be downloaed from the manufacturer's website.
Types of drivers
There are broadly two types of device drivers real and virtual and there are two modes of
working of device drivers, user mode drivers and kernel mode drivers. User mode drivers are
those that work in user mode, for example a printer driver is made for printer which is used by
user. On the other hand kernel mode driver is a driver which works in kernel mode for example
driver for cache memory and various mother board components is kernel level. Some kernel-
mode drivers conform to the Windows Driver Model (WDM) and support Plug and Play (PnP),
and power management, and are also source-compatible across various versions of Windows.
Windows 98/Me and Windows 2000 and later operating systems. Kernel level drivers are
171
generally divided into three levels, First being Highest Level. Some file system related drivers
(NTFS, FAT, CDFS) fall in this category, they depend on next level drivers. Intermediate level
or middle level drivers are device-specific. Drivers for specific peripheral devices and software
bus drivers fall into this category. Third is lowest level, system supplied hardware bus drivers
and legacy drivers fall into this category.
Virtual device drivers
Another type of device drivers is virtual device driver. They handle software interrupts in
contrast to hardware interrupts handled by their counterparts. Whereas an ordinary device driver
has a .dll or .exe file in windows, a virtual device driver has .vxd or .vlm file. The basic purpose
of virtual device drivers is to emulate a hardware device in "virtual machine" environments.
They are further of two types static and dynamic. Actually, device drivers in windows are of
two types : Virtual Device Drivers (VXD) and Windows Driver Model (WDM). Virtual device
drivers are older, and are less compatible with new versions of windows, while real mode drivers
or WDM drivers are supposed to be fully compatible to all windows versions till Windows 98.
Features of a device driver
Abstraction : Device driver works as “black box” that make a device work as per well-defined
internal programming interface only, hiding the details of how the device works.
Unification : Device driver makes similar devices look and work in similar manner.
Protection: Only authorized applications can access the device via using its associated device
driver.
Check Your Progress / Self Assessment Questions
Que. 4. What is WDM?
Que. 5. Which are two modes of working of a device driver?
Que. 6. What is NTFS?
13.4 .USB and Plug and Play systems
172
13.4.1 USB :
USB is Universal Serial Bus. It is a type of external expansion bus for a PC. An expansion bus is
a collection of tiny wires that allow for computer expansion with the use of an expansion board
or expansion card which is a PCB (Printed Circuit Board) on the motherboard providing some
additional features to a computer system like providing an input/output pathway for transferring
data between computer memory and an expansion device.
Figure : 13.2 USB Logo
Figure 13.3 USB Configuration ( Source : www.usb.org/developers/docs/whitepapers/usb_20g.pdf)
USB standard was introduced in 1995 to replace the variety of connectors connected to different
types of ports like serial, parallel, keyboard, mouse, SCSI and Ethernet ports. It was a joint
venture of big hardware and software companies like Intel, HP, Compaq, Lucent, Microsoft,
NEC and Philips. With USB, any device can be attached at any USB port provided at backside or
at the front side of CPU cabinet. All USB ports are of same shape and size. Also, the wires
required to connect the device with the CPU cabinet are also alike and any wire can be used with
173
any device. It was not so earlier with different types of serial and parallel ports which have their
own shape and size and required different types of jacks to be plugged in. Now a days, most of
the computer manufacturers have started giving only USB ports at the backside of CPU cabinet
to connect keyboard, mouse, printer, scanner etc. And every hardware manufacturer have started
supplying only USB cables with the device. So now a newly purchase printer can be connected
to a PC via USB cable only. An additional advantage of USB is that it allows variety of
peripheral devices to be self-configuring. This allows the user to plug in a device at a USB port
and let the OS install the drivers required by the device. USB promises to one day make adding
new devices truly plug and play. As the name implied USB is a serial bus. It means data flows in
a series of pulses along one pair of wires internally (in a parallel connection, data flows parallel
in many pairs of wires, and communication is faster). An advantage of serial bus is that wires
used are tiny, thinner, longer and easier to use and carry. Also, USB is "party line" means, all
devices on the bus share same communication channel. Up to 127 devices can be daisy chained
on the same USB connection as shown in figure 13.4 below.
Figure 13.4 USB Configuration ( Source : www.usb.org/developers/docs/whitepapers/usb_20g.pdf)
The USB wire also carries a limited amount of power to the devices and that is why a mobile
phone can also be connected to a PC with USB wire and it can start charging by dragging some
power from PC. Further, the same USB can also act as data cable and power cable when a smart
phone is connected to a PC using USB cable.
174
USB standards and version are developed an industrial body called the USB Implementers
Forum (USB-IF). Since its release, USB is continuously going under revisions to improve
performance. Initial version, USB 1.1 released in 1996 had approximately 1.5 Mbps, USB 2.0
had 480 Mbps, USB 3.0 is a fastest version of the USB till now, it is also called as Super Speed
USB, it is theoretically capable of transferring data at 4,800 Mbps (Mega bits per second) i.e. 4.8
Gbps (Giga bits per second). Various USB version are shown in figure below.
Figure 13.5 USB Versions. (Source : http://www.digitalcitizen.life/simple-questions-what-
usb-or-universal-serial-bus)
USB connectors : There are different types of USB connectors for diffent standards of USB.
USB connectors are named as A-type, B-Type in its original specification. Later connector C-
Type was also developed to be used with USB 3.0. A-type connector is a flat, rectangular
interface and A-A cables are used to connect a-type connectors to USB port. B-type connector is
used on USB peripheral device like a printer. It is square shaped, and it requires A-B cable. C-
type connector is newest type used with USB 3.1 it is reversible and symmetrically designed and
can be plugged into any USB-C type port irrespective of its vertical orientation. Micro-USB A
type connector has compact 5 pin design and it is used with charging cable of smart phones.
Figure 13.6 USB Connectors.
175
Check Your Progress / Self Assessment Questions
Que. 7. What is USB ?
Que. 8. Which is latest standard of USB?
Que. 9. Name any three types of connectors of USB ?
13.4.2 Plug n Play
Plug and Play (abbreviated as PnP) is hardware and software technique and describes devices
that get ready to work as soon as they are plugged in or connected. It is a software technique
given initially by Microsoft it operating system Windows 95 and lets the user plug a device and
lets the PC automatically detect and configure or install it quickly with no user involvement and
makes it ready to use immediately. Microsoft highlighted the technique to increase its sale of
Windows 95 though a similar technique had already long been built by Apple Inc. in its
Macintosh computers. PnP has 3 requirements :
PnP BIOS : Firmware of a PC must support PnP so that a device is automatically detected as
soon as it is connected.
ECSD (Existing System Configuration Data) : A software that contains information about
various installed PnP devices.
PnP OS : Underlying Operating system must also support PnP. For example, in windows there is
a component called Plug and Play manager that interacts with Hardware Abstraction Layer of
operating system kernel for PnP to work.
Check Your Progress / Self Assessment Questions
Que. 10. With which OS PnP was started by Microsoft?
Que. 11. Which are three requirements of PnP?
176
13.5 Summary
When a computer starts up, operating system is loaded from external memory into internal
memory so that it can start its working. This process of loading operating system of an electronic
device into its memory is called booting; this is also termed as booting up a computer.
Depending upon how and when booting takes place, it is of different types. There can be cold
booting or hard booting, warm booting or soft booting, local booting and remote booting, A
subroutine is a subprogram called by main program that performs some specific task and exits.
Main program may send some values called arguments to the subroutine which are used by
subroutine for processing the task it has been assigned. A device driver is a type of system
software. It is a small but complex software module that enables communication between a
computer and its connected peripheral device. A device driver is used to tell operating system
how does the device work. USB is Universal Serial Bus. It is a type of external expansion bus for
a PC. An expansion bus is a collection of tiny wires that allow for computer expansion with the
use of an expansion board or expansion card. Plug and Play (abbreviated as PnP) is hardware and
software technique and describes devices that get ready to work as soon as they are plugged in or
connected. It is a software technique given initially by Microsoft it operating system Windows
95
13.6 Glossary
Booting : Loading operating system into internal memory of computer at start up.
Cold Booting : When computer starts up from the state when it was powered off.
Warm booting : Rebooting or restarting a computer.
Subroutine: a sub program called by main program to do a specific task.
DOS : Disk Operating System
MBR : Master Boot Record
Sector 0 : First sector of hard disk where MBR is loaded.
USB : Universal Serial Bus
Expansion Bus : Collection of tiny wires that allow for computer expansion with the use
of an expansion board
PnP : Plug and Play, a technique given by Microsoft where by a device is automatically
detected and configured by computer when device is attached to it.
177
13.7 Answers to Check Your Progress / Self Assessment Questions
Ans 1. Cold booting and warm booting
Ans 2. Master Boot Record
Ans 3. Sector 0 of hard disk
Ans. 4. Windows Driver Model. It is model given by Microsoft for real mode device drivers.
Ans 5. User mode and Kernel Mode
Ans 6. NTFS is NT File System (New Technology File System).
Ans. 7. USB is Universal Serial Bus, it is external expansion bus.
Ans. 8. USB 3.1
Ans. 9. Type-A, Type-B, Type-C, Type-A Mini, Type-A Micro
Ans 10. Microsoft Windows 95
Ans. 11. PnP BIOS, ECSD, PnP OS
13.8 References / Suggested Readings
Fundamentals of Operating System by Anshuman Sharma, Anurag Gupta
System programming : Dinesh Gupta, Kalyani Publisher
13.9 Model Questions
1 Which are different types of booting?
2. Which are different techniques of booting?
3. What is a subroutine?
4. Explain USB.
5. Which are different types of connectors used with USB?
6. What is PnP?
Unit 4
178
Lesson 14.
System Programming API.
Chapter Index
14.0 Objectives
14. 1 Introduction
14.2 I/O programming
14.3 Systems Programming (API’s)
14.4 Summary
14.5 Glossary
14.6 Answers to Check Your Progress / Self Assessment Questions
14.7References / Suggested Readings
14.8 Model Questions
14.0 Objectives
To know the meaning and importance of I/O Programming
To study about various I/O Programming techniques.
To know the meaning of API.
To explore System Programming API.
14.1 Introduction
I/O programming is concerned with programming of I/O devices, i.e. how to fetch input and how
to save output of a device. Mostly devices are accessed via I/O ports. A Port is nothing but just a
special memory address and maps to input output pins on the device, these pins are actually
hardware interface to the device. I/O Programming is of two types, Synchronous I/O
Programming and Asynchronous I/O Programming.
API stands for Application Programming Interface. It can be available in the form of library
files, routines, modules, software tools, inbuilt-functions, DLL (Dynamically Link Library) files
etc. and is used in developing other software and applications. API act as support software.
Importance of API is to make it easier to develop a program by providing all the building blocks,
179
which are then put together by the programmer. Depending on its type and domain, API can be
system API, web API, database API etc. It supports software reusability.
14.2 I/O programming
I/O programming is also known as device programming. Any Input Output device is useless if it
does not communicate with the outer world or with the computer it is attached to. So it needs to
employ I/O techniques for both i.e. getting data from outside world or user and to give stored
data back to the user. I/O devices are controlled either directly or indirectly by their own
processor called controller. Each Input Output device is accessed by main processor through
input output ports. A port is just an address just like memory address. A port maps onto input
pins or registers on the device. Data can be received from or sent to the device in two ways,
synchronously or asynchronously. Synchronous transmission keeps both receiver and sender
synchronized with clock pulse and requires hand shaking. Asynchronous transmission does not
need clock signal and hand shaking between sender and receiver. Any hardware device that can
be programmed uses two processes sensor and actuator,
Input/output for most computers is asynchronous. Input Output device only initiates an
operation, actual transfer takes time depending on the nature of the device, data transfer
operation etc. Each device has a controller chip (an electronic circuit) which controls the device
as instructed by CPU.
14.2.1 IN and OUT instructions
Assembly language provides some instructions for interfacing with input output devices. Two
popular instructions are In, Out. IN and OUT are used to transfer data between microprocessor's
accumulator register that can be AL, AX or EAX and an I/O device while executing IN and
OUT, I/O port address is stored in Register DX or the fixed byte address immediately following
the op-code.
180
Figure 14.1 Sample IN, OUT instructions
(Source : http://ece-research.unm.edu/jimp/310/slides/8086_IO1.html)
IN instruction starts fetching information from device into memory, storing the data transferred
from device at the memory locations at effective address. The previous contents of the memory
are overwritten by the storage of the new information.
OUT instruction sends data from memory to an output device. The data is transferred from
memory address specified by the effective address in OUT instruction.
For both the IN and OUT instructions transfer one record of information at a time. A record is
the physical unit of information naturally handled by I/O device.
When IN or OUT instructions start a data transfer operation device may be in ready state waiting
for a new command or it may be already busy with a previous data transfer still running. If it is
busy, once the device finishes previous data transfer, it immediately starts executing next waiting
command. Device status whether ready or busy is indicated by the ready/busy bit in the device
controller. The setting and clearing of this bit is totally controlled by the device itself.
So, if an input output device is ready, its ready/busy bit will be set and it will take only one uint
of time to complete an input output instruction. But if device is busy, its ready/busy bit will be
cleared, and it may take long unpredictable units of time to wait and to become it free to start
executing next instruction and to become busy again. This extra waiting time is
called interlock time.
181
14.2.2 Synchronous and Asynchronous I/O Programming
Synchronous I/O Programming
This I/O technique waits for a function call or an input output instruction execution to complete
before starting next work. This is convenient, easy and efficient method from the programmer's
point of view. But it is not a good choice in multitasking environment where many jobs are
running simultaneously, it can create many problems.
Asynchronous I/O Programming
It does not force other processes to wait. Therefore, it is the preferred technique for programming
device drivers of many multitasking operating systems like Windows. Supporting asynchronous
I/O is one of the major design goals of developing device drivers. It is more complicated than
synchronous I/O programming.
Check Your Progress / Self Assessment Questions
Que 1. What is controller?
Que. 2. Which two states a device can be in at any time?
Que. 3. Which two instructions are used largely for device input output?
Que. 4. Which are two types of I/O Programming techniques?
14.3 Systems Programming (API’s)
14.3.1 What is System Programming
Operating Systems restrict application software running on the top layer from directly accessing
critical system resources as a preventive security measure. For this, operating systems provides
different interfaces to application software that they can use for managing these system
resources. For example, operating system may restrict a user application from directly drawing to
182
the screen or read or write system memory directly. For performing these operationsan
application software have to use system service through system calls. This is actually what is
done by C language functions like printf or scanf etc. i.e they make calls to system routines that
perform some tasks on their behalf. Similarly dynamic memory allocation functions of C like
malloc, calloc etc. do not allocate memory directly rather they also use specific in built system
routines for this purpose.
Such system calls are just like an interface provided by an OS for application software to use to
access system resource, and this is part of system API. Writing this type of system API is system
programming.
14.3.2 What is API?
API is Application Programming Interface. It is a set of files, routines, modules, routines,
software tools, functions etc. used in developing other software and applications. They act as
support software. Importance of API is to make it easier to develop a program by providing all
the building blocks, which are then put together by the programmer. API provides a way to
interact different software components in many ways by enabling data sharing and content
sharing among software components. It also supports application reuse. Actually API have been
there since long when the operating systems started to emerge. But with the increase in
complexity of Operating Systems, Internet Applications, Data Base Management Software,
System Software etc. their popularity and use has been increasing. Many system software
provide their own API to reduce the efforts of application programmers who can now utilise the
system library to make their software, it lets them concentrate on main job rather than worrying
about how a particular low level functionality is to be created. APIs are developed by
experienced senior application and system programmers. API management refers to ensuring that
API performs well and consistently and does not affect the performance or security of the
backend software components they offer to use and they interact with. API Publisher is an
organization that develops APIs and offers it to internal, partner or third-party developers of
client applications. There are different types of API for example, System programming API (like
file system API, device drive API) or it may be for an Operating System( for example Windows
API or Android API), or a DBMS (for example, Oracle API) or it can be for a web based (for
183
example e-bay API). Whatever type it is, it makes convenient to develop applications for that
system using a programming language. For example, a developer developing mobile phone apps
for Android based mobiles will like to use an Android API to interact with underlying system
hardware, like rear camera or keypad etc.
Check Your Progress / Self Assessment Questions
Que. 5. What is API ?
Que. 6. Name any two types of API.
14.3.3 Types of System API
An API can be placed in a category depending on its type of action performed and by the system
function it relates to. Many APIs are used alone where as some other can be used together to
perform a task or a function. Some of the more common types are :
a. List APIs : These are used to return lists of information about something on the system.
b. Retrieve APIs: These APIs return requested information to an application program.
c. Create, change, and delete APIs : This category of APIs work with objects of a
specified type on a system.
d. Other APIs: Other types of miscellaneous API that perform a variety of actions on the
system.
14.3.4 API Parameters
Once it is know that what type of API is available and what type of API is to be used while
developing an application, it is required to know what is signature of a function made available
in API. Signatures of a function include, its name, return type and its arguments or parameters.
Any API-function can have different types of parameters, three types of parameters are :
Mandatory : These are types of parameters that must be given in the order specified.
Function will not work, and may return error if mandatory parameters are not given.
Optional: It means that you may or may not specify these parameters. If not specified,
function may assume default values.
184
Omissible: It means that it can be omitted. When these parameters are omitted a null
pointer must be specified as an argument.
14.3.5 Who should use API
It is little difficult to learn API due to its complexity and large number of functions, procedures,
hard-to-remember keywords etc., so ther are actully used by experienced application and system
programmers who develop complex application-level and system-level applications.
Check Your Progress / Self Assessment Questions
Que. 7. Which are different types of System API?
Que. 8. Which are different types of API parameters?
14.3.6 Windows API
The Windows API is collection of modules, functions, procedures, interfaces, objects, structures,
unions, macros, constants, data types and many other such programming elements used to create
Windows based applications, it is known as WinAPI or Win32API. It has been created for
programmers of C and C++, and is directly used to create Windows applications. Win API is
organized in different categories like, Core services, Security Services, Graphics Programming,
User Interface Development, Multimedia capabilities, Windows core shell, Networking related
services etc.
First, Core Win API services deal with fundamental resources on Windows like file
system, I/O devices, processes and threads, windows registry etc.
Second, Security services provides interfaces for authentication, authorization, encryption
decryption etc.
Graphics subsystem includes the GDI (Graphics Device Interface), GDI+, DirectX or
OpenGL.
185
UI Development provides functionality to create windows and windows based controls
like text boxes, dialog boxes, windows etc..
Multimedia related API makes available various tools for working with audio, video,
graphics devices.
Windows shell provides access to the basic functionality provided by the OS shell.
Networking API provides access to the network related functions like FTP, Telnet, SMTP
etc. available in Windows.
Microsoft provides documentation of its Win API using its MSDN (Microsoft Developer
Network). Actual implementation of the Win API functions is the form of DLL (Dynamic Link
Libraries) files like Kernel32.dll, User32.dll, GDI32.dll etc. placed in the Windows system
directory. With each next release of Windows, Win API is growing in number of functions
available.
14.3.7 UNIX API
The core part of UNIX OS is its kernel. It provides UNIX's services directly or indirectly. The
kernel is a large complex program, actually a collection of interacting programs with many entry
points. These entry points provide services which kernel performs. Collection of these kernel
entry points constitutes UNIX's API. Kernel is a collection of separate functions, bundled
together into a large package, and its API is collection of signatures or prototypes of these
functions. UNIX API is in the form of system calls. These system calls are interface provided by
UNIX for application software to use to access system resource, and this is part of system API.
A system call just like an ordinary function call in the sense that it also causes the control to
jump to the function body, and a return to the caller function after completing function
execution. But its significantly different since it is actually a call to a function that is part of the
UNIX kernel and is provide by developer of the Kernel.
186
Figure 14.1 : UNIX API. (Source : http://alandix.com/academic/tutorials/courses/Prog-
I.pdf)
14.4 Summary
I/O programming is concerned with programming of I/O devices. I/O programming is also
known as device programming. Mostly devices are accessed via I/O ports. A Port is nothing but
just a special memory address and maps to input output pins on the device, these pins are
actually hardware interface to the device. I/O Programming is of two types, Synchronous I/O
Programming and Asynchronous I/O Programming. Assembly language provides some
instructions for interfacing with input output devices. Two popular instructions are In, Out.
187
API stands for Application Programming Interface. It can be available in the form of library
files, routines, modules, software tools, inbuilt-functions, DLL (Dynamically Link Library) files
etc. and is used in developing other software and applications. There are different types of API
for example, System programming API (like file system API, device drive API) or it may be for
an Operating System (for example Windows API or Android API), or a DBMS (for example,
Oracle API) or it can be for a web based (for example e-bay API). Whatever type it is, it makes
convenient to develop applications for that system using a programming language. Types of
System API are List API, Retrieve API, Create/Change/Delete API etc. The Windows API is
collection of modules, functions, procedures, interfaces, objects, structures, unions, macros,
constants, data types and many other such programming elements used to create Windows based
applications, its is known as WinAPI or Win32API. Win API is organized in different categories
like, Core services, Security Services, Graphics Programming, User Interface Development,
Multimedia capabilities, Windows core shell, Networking related services etc. Microsoft
provides documentation of its Win API using its MSDN (Microsoft Developer Network). Win
API is organized in different categories like, Core services, Security Services, Graphics
Programming, User Interface Development, Multimedia capabilities, Windows core shell,
Networking related services etc. UNIX API is in the form of system calls. These system calls
are interface provided by UNIX for application software to use to access system resource, and
this is part of system API.
14.5 Glossary
API : Application Programming Interface
Port : A Port is just a special memory address and maps to input output pins on the device, these
pins are actually hardware interface to the device.
API management : It refers to ensuring that API performs well and consistently and does not
affect the performance or security of the backend software components they offer to use and they
interact with.
API Publisher : It is an organization that develop APIs and offer it to internal, partner or third-
party developers of client applications.
DLL : Dynamically Link Library files on windows.
188
Win API : API provided and used by Windows, also known as Win32 API.
MSDN : Microsoft Developer Network. Documentation of Windows API.
14.6 Answers to Check Your Progress / Self Assessment Questions
Ans. 1. Controller is a chip on the device that controls it as directed by main CPU.
Ans. 2. A device at any time can be in ready state or busy state.
Ans. 3. IN and OUT.
Ans. 4. Synchronous and asynchronous.
Ans 5. Application Programming Interface is group of library routines.
Ans 6. System programming API, Web API.
Ans 7. List APIs, Retrieve API, Create or Change API
Ans. 8. Mandatory, Optional and Omissible.
14.7 References / Suggested Readings
Win32 Programming by Brent E. Rector Publisher: Addison-Wesley
Win32 API Programming with Visual Basic by Steven Roman, Publisher :O'Reilly
14.8 Model Questions
1. What is meaning of I/O Programming?
2. Which system calls does Windows provide for I/O Programming?
3. What is API? What is its use?
4. Write a note on Windows API.