system software and languages

Upload: anurag-goel

Post on 14-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 System Software and Languages

    1/55

    SYSTEM SOFTWARE AND LANGUAGES

    INTRODUCTION TO COMPUTER SOFTWAREA computer contains two basic parts: (i) Hardware and (ii) Software. In the first two units wetouched upon hardware issues in quite detail. In this unit and also in the rest of the units of this

    block we will discuss topics related to software. Without software a computer will remain just a

    metal. With software, a computer can store, retrieve, solve different types of problems, createfriendly environment for software development etc.

    The process of software development is called programming. To do programming one should haveknowledge of (i) a particular programming language, (ii) set of procedures (algorithm) to solve a

    problem or develop software. The development of an algorithm is basic to computer programmingand is an important part of computer science studies. Developing a computer program is a detailed

    process, which requires serious thought, careful planning and accuracy. It is a challenging andexacting task, drawing on the creativity of the programmer.

    Once an algorithm is obtained, the next step for a solution using a computer would be to program

    the algorithm using mathematical and data processing techniques. Programming languagesconstitute the vehicle for this stage of problem solving. The development of programming

    Languages is one of the finest intellectual achievements in Computer Science. It has been said "tounderstand a computer, it is necessary to understand a programming language. Understanding

    them does not really mean only being able to use them. A lot of people can use them without reallyfully understanding them".

    An Operating System is system software, which may be viewed as an organized collection of

    software consisting of procedures for operating a computer and providing an environment forexecution of programs. It acts as an interface between users and the hardware of a computer

    system.

    There are many important reasons for studying operating systems. Some of them are:User interacts with the computer through operating system in order to accomplish his task since it

    is his primary interface with a computer. It helps users to understand the inner functions of acomputer very closely. Many concepts and techniques found in operating system have general

    applicability in other applications. In this unit, we will discuss about the concepts relating to a

    programming language and in the next unit we will deal with the operating system concepts.

    INTRODUCTION TO SYSTEM SOFTWAREComputer software consists of sets of instructions that mould the raw arithmetic and logical

    capabilities of the hardware units to perform.

    In order to communicate with each other, we use natural languages like Hindi, English, Bengali,Tamil, Marathi, Gujarati etc. In the same way programming languages of one type or another are

    used in order to communicate instructions and commands to a computer for solving problems.Learning a programming language requires learning the symbols, words and rules of the language.

    Program and Programming: A computer can neither think nor make any judgment on its own. Also

    it is impossible for any computer to independently analyse a given data and follow its own methodof solution. It needs a program to tell it what to do. A program is a set of instructions that arearranged in a sequence that guides the computer to solve a problem. The process of writing a

    program is called Programming. Programming is a critical step in data processing. If the system is

    not correctly programmed, it delivers information results that cannot be used. There are two waysin which we can acquire a program. One is to purchase an existing program, which is normally

    referred to as packaged software and the other is to prepare a new program from scratch in whichcase it is called customized software. A computer software can be broadly classified into two

    categories-System Software and Application Software. Today, there are many languages availablefor developing programs software. These languages are designed keeping in mind some specific

    areas of applications. Thus, some of the languages may be good for writing systemprograms/software while some other for application software.

  • 7/30/2019 System Software and Languages

    2/55

    Since a computer can be used for writing various types of application/system software, there aredifferent programming languages.

    i) System Programming Languages: System programs are designed to make the computer

    easier to use: An example of system software is an operating system, which consists of many otherprograms for controlling input/output devices, memory, processor etc. To write an operating

    system, the programmer needs instruction to control the computer's circuitry (hardware part). Forexample, instructions that move data from one location of storage to a register of the processor. C

    and C++ languages are widely used to develop system software.

    ii) Application Programming Language: Application programs are designed for specificapplications, such as payroll processing, inventory control etc. To write programs for payroll

    processing or other applications, the programmer does not need to control the basic circuitry of acomputer. Instead the programmer needs instructions that make it easy to input data, produce

    output, do calculations and store and retrieve data. Programming languages that are suitable forsuch application programs support these instructions but not necessarily the types of instructions

    needed for development of system programs.

    There are two main categories of application programs: business programs and scientificapplication programs. Most programming languages are designed to be good for one category of

    applications but not necessarily for the other, although there are some general purpose languagesthat supports both types. Business applications are characterized by processing of large inputs and

    large outputs, high volume data storage and retrieval but call for simple calculations. Languages,which are suitable for business program, development, must support high volume input, output

    and storage but do not need to support complex calculations. On the other hand, programming

    languages that are designed for writing scientific programs contain very powerful instructions forcalculations but rather poor instructions for input, output etc. Amongst traditionally used

    programming languages, COBOL (Commercial Business Oriented Programming Language) is more

    suitable for business applications whereas FORTRAN (Formula Translation - Language) is moresuitable for scientific applications. Before we discuss more about languages let us briefly look at the

    categories of software viz. system and application software.

    SYSTEM SOFTWARE

    Language TranslatorA language translator is a system software which translates a computer program written by a user

    into a machine understandable form.

    Operating SystemAn operating system (OS) is the most important system software and is a must to operate acomputer system. An operating system manages a computer's resources very effectively, takes

    care of scheduling multiple jobs for execution and manages the flow of data and instructionsbetween the input/output units and the main memory. Advances in the field of computer hardware

    have also helped in the development of more efficient operating systems.

    UtilitiesUtility programs are those which are very often requested by many application programs. A fewexamples are:

    SORT/MERGEutilities, which are used for sorting large volumes of data and merging them into a single sortedlist, formatting etc.

    APPLICATION SOFTWAREApplication software is written to enable the computer to solve a specific data processing task. A

    number of powerful application software packages, which does not require significant programmingknowledge, have been developed. These are easy to learn and use as compared to the

  • 7/30/2019 System Software and Languages

    3/55

    programming languages.Although these packages can perform many general and special functions, there are applications

    where these packages are not found adequate. In such cases, application program is written tomeet the exact requirements. A user application program may be written using one of these

    packages or a programming language. The most important categories of software packagesavailable are:

    Data Base Management Software

    Spreadsheet Software

    Word Processing Desktop Publishing (DTP) and presentation Software Graphics SoftwareData Communication SoftwareStatistical and Operational Research Software

    Data Base Management SoftwareDatabases are very useful in creation maintaining query, the databases and generation of reports.Many of today's Database Management System are Relational Database Management System's.

    Many RDBMS packages provide smart assistants for creation of simple databases for invoices,orders and contact lists. Many database management systems are available in the market these

    days. You can select any one based on your needs, for example, if you have only few databasesthen package like dBase, FoxPro etc. may be good. If you require some additional features and

    moderate work load then Lotus Approach, Microsoft Access are all-right. However, if you are having

    high end database requirements which requires multi-user environment and data security, accessright, very good user interface etc. then you must go for professional RDBMS package like Ingress,

    Oracle, Integra etc.

    Accounting PackageThe accounting packages are one of the most important packages for an office. Some of the

    features, which you may be looking on an accounting, may be:

    tax planner facilityfacility for producing charts and graphs

    finding accounts payablesimple inventory control facility

    payroll functions

    on-line connection to stock quotescreation of invoices easily

    One of the good packages in this connection is Quicken for windows.

    Communication PackageThe communication software includes software for fax. The fax-software market is growing up.Important fax software is Delrina's WinFax PRO 4.0. Some of the features such as Remote

    Retrieval and Fax Mailbox should be looked into fax software. These features ensure thatirrespective of your location you will receive the fax message. Another important feature is fax

    Broadcast. This allows you to send out huge numbers of faxes without tying up your fax machineall day.

    If you have to transfer files from your notebook computer to a desktop computer constantly thenyou need a software program that coordinates and updates documents. On such software is Laplink for Windows. This software offers very convenient to use features. For example, by simply

    dragging and dropping a file enables file transfer. This software can work if a serial cable or aNovell network or a modem connects you.

    Desktop Publishing PackagesDesktop Publishing Packages are very popular in Indian context. Newer publishing packages also

    provide certain in built formats such as brochures, newsletters, flyers etc., which can be useddirectly. Already created text can be very easily put in these packages, so are the graphics

    placements. Many DTP packages for English and languages other than English are available.Microsoft Publisher, PageMaker, Corel Ventura are few popular names. Desktop publishing

  • 7/30/2019 System Software and Languages

    4/55

    packages, in general, are better equipped in Apple-Macintosh computers.

    CATEGORIES OF LANGUAGESWe can choose any language for writing a program according to the need. But a computer executes progra

    after they are represented internally in binary form (sequences of 1s and 0s). Programs written in any othelanguage must be translated to the binary representation of the instructions before the computer can execu

    those. Programs written for a computer may be in one of the following categories of languages.

    MACHINE LANGUAGEThis is a sequence of instructions written in the form of binary numbers consisting of l s, 0s to which the co

    responds directly. The machine language was initially referred to as code, although now the term code is usmore broadly to refer to any program text. An instruction prepared in any machine language will have at leparts. The first part is the command or Operation, which tells the computer what functions, is to be perform

    computers have an operation code for each of its functions. The second part of the instruction is the operantells the computer where to find or store the data that has to be manipulated. Just as hardware is classified

    generations based on technology, computer languages also have a generation classification based on the leinteraction with the machine. Machine language is considered to be the first generation language.

    Advantage Of Machine LanguageIt is faster in execution since the computer directly starts executing it.

    Disadvantage Of Machine Language

    It is difficult to understand and develop a program using machine language. Anybody going through this prfor checking will have a difficult task understanding what will be achieved when this program is executed.Nevertheless, the computer hardware recognizes only this type of instruction code.

    The following program is an example of a machine language program for adding two numbers.

    0011 1110 Load A register with

    0000 0111 value 7

    0000 0110 Load B register with 10

    0000 1010 A = A+B

    1000 0000 store the result

    0011 1010 into the memory location

    0110 0110

    0000 0000 whose address is 100 (decimal)

    0111 0110 Halt processing

    ASSEMBLY LANGUAGEAssembly language unlocks the secret of your computer's hardware and software. It teaches you about thethe computer's hardware and operating system work together and how, the application programs communi

    with the operating system. Assembly language, unlike high level languages, is machine dependent. Each

    microprocessor has its own set of instructions, that it can support.

    When we employ symbols (letter, digits or special characters) for the operation part, the address part and

    parts of the instruction code, this representation is called an assembly language program. This is considerethe second-generation language. Machine and Assembly languages are referred to as low level languages s

    the coding for a problem is at the individual instruction level.

    Each machine has got its own assembly language, which is dependent upon the internal architecture of theprocessor. An assembler is a translator, which takes its input in the form of an assembly language program

    produces machine language code as its output. The following program is an example of an assembly languaprogram for adding two numbers X and Y and storing the result in some memory location.

    LDA ,7 Load register A with 7

    LDB ,10 Load register B with 10

  • 7/30/2019 System Software and Languages

    5/55

    ADD A,B A_A+B

    LD (100),A Save the result in the location 100

    HALT Halt process

    From this program, it is clear that usage of mnemonics in our example LD, ADD, HALT are the mnemonics)improved the readability of our program significantly.

    A machine cannot execute an assembly language program directly, as it is not in a binary form. An assemb

    needed in order to translate an assembly language program into the object code executable by the machinis illustrated in the figure 1.

    Figure 1: Assembler

    Advantage of Assembly LanguageWriting a program in assembly language is more convenient than in machine language. Instead of binary

    sequence, as in machine language, it is written in the form of symbolic instructions. Therefore, it gives a litmore readability.

    Disadvantages of Assembly LanguageAssembly language (program) is specific to particular machine architecture. Assembly languages are design

    specific make and model of a microprocessor. It means that assembly language programs written for oneprocessor will not work on a different processor if it is architecturally different. That is why the assembly lan

    program is not portable. Assembly language program is not as fast as machine language. It has to be firsttranslated into machine (binary) language code.

    VARIABLES, CONSTANTS, DATA TYPE, ARRAY AND EXPRESSIONS

    These are the smallest components of a programming language.

    figure - 2 Memory Organization

    VariableThe first thing we must learn is how to use the internal memory of a computer in writing a

    program. Memory may be pictured as a series of separate memory cells as shown in figure 2 .Computer memory is divided into several locations. Each location has got its own address.

    Each storage location holds a piece of information. In order to store or retrieve information from a

  • 7/30/2019 System Software and Languages

    6/55

    memory location, we must give that particular location a name. Now study the following definition.Variable: It is a character or group of characters assigned by the programmer to a single memory

    location and used in the program as the name of that memory location in order to access the valuestored in it.

    For example in expression A = 5, A is a name of memory location i.e. a variable where 5 is stored.

    ConstantIt has fixed value in the sense that two cannot be equal to four. String constant is simply asequence of characters such as "computer" which is a string of 8 characters. The numeric constant

    can be integer representing whole quantities or a number with a decimal point to representnumbers with fractional part. Constant would be probably the most familiar concept to us since wehave used it in doing everything that has to do with numbers. Numeric constants can be added,

    subtracted, multiplied, divided, and also compared to say whether two of them are equal, less than

    or greater than each other.

    As string constants are a sequence of characters, a related string constant may be obtained from a

    given one, by chopping off some characters from beginning or end or both or by appending anotherstring constant at the beginning or end. For example, from 'Gone with the wind', we can get 'one

    with ', 'Gone with wind', and so on. String constants can also be compared in a lexicographic(dictionary) sense to say whether two of them are equal, not equal, less than or greater than each

    other.

    Data typeIn computer, programming, the term data refers to anything and everything processed by thecomputer. There are different types of data processed by the computer, numbers are one type of

    data and words are of another type. In addition, the operations that are performed on data differfrom one type of data to another type. For example multiplication applies to numbers and not

    words or sentences.

    Data type defines a set of related values/integers, number with fraction, characters and a set of

    specific operations that can be performed on those values.In BASIC a statement LET A = 15 denotes that A is a numeric data type because it contains

    numbers but in a statement LET A$ = "BOMBAY", A$ is a variable of character data type. Data type

    also defines in terms of contiguous cells should be allocated for a particular variable.

    ArrayIn programming we deal with large amount of related data. To represent each data element we

    have to consider them as separate variables. For example if we have to analyse for the salesperformance of a particular company for the last 10 years, we can take ten different variables

    (names) each one representing sales of a particular year. If we analyse sales information for morethan 10 years, then accordingly number of variables will further increase. It is very difficult to

    manage with large number of variables in a program. To deal with such situation an array is used.An array is a collection of same type of data (either string or numeric), all of that are referenced by

    the same name. For example, list of 5 years sales information of a company can be referred to bysame array name A.

    A(1) A(2) A(3) A(4) A(5)50,000 1,00,000 5,00,000 8,00,000 9,00,000

    A(1) specifies Sales information of a first year

    A(2) specifies Sales information of a second yearA(3) specifies Sales information of a fifth year

    ExpressionWe know that we can express intended arithmetic operations using expressions such as X +Y+ Z

    and so on. Several simple expressions can even be nested together using parentheses to formcomplex expressions. Every computer language specifies an order by in which various arithmetic

    operators are evaluated in a given expression. An expression may contain operators such as

  • 7/30/2019 System Software and Languages

    7/55

    Parentheses ( )Exponentiation ^

    Negation -Multiplication, division *, /

    Addition, subtraction +,-The operators are evaluated in the order given above. For example, the expression

    2+8*(4 - 613)can be considered to be evaluated as follows:

    2+8*(4 - 6/3)Sub expression (4 - 6/3) taken up first

    2+8*(4 - 2)division 6/3 within (4 - 6/3) has higher priority than 4 - 62+8*2Subtraction (4 - 2) is performed next (4 - 6/3) is now complete.2+8*28*2 will be executed first then its result will be added with 2 that is 16 + 2 = 18

    It is useful to remember the order of priority of various operators. But it is safer to simplify

    expressions and enclose them in parentheses to avoid unpleasant surprises. So far we havefocused on arithmetic expressions. But expression is a very general concept. We mentioned earlier

    that apart from arithmetic operations we could compare numbers or strings. We do it by usingrelational operators in expressions.

    The following is a list of relational operators:

    = equal to

    < > not equal to< less than.

    > greater than

    = greater than or equal to

    These operations have the same level of Priority among themselves but a lower priority thanarithmetic operators mentioned earlier. The relational expressions result in one of the truth-values,

    either TRUE or FALSE. When a relational expression such as (3 > 5) is evaluated to be FALSE bysuch languages, a value 0, that is false, is assigned, whereas (5, < 7) will be evaluated to be

    TRUE, and value 1 will be assigned.

    Note that relational expressions are capable of comparing only two values separated by appropriaterelational operator. If we want to an express idea such as whether number 7 happens to be within

    two other numbers 4 and 10, we may be tempted to write relational expression 4 2) OR (7 > 2) is TRUE

  • 7/30/2019 System Software and Languages

    8/55

    XOR

    TRUE only if one of the adjoining

    expressions is TRUE and other isFALSE.

    The XOR has same priority as OR. (4< 7) XOR (7 < 10) is FALSE.

    ASSEMBLY LANGUAGE FUNDAMENTALS

    The best way to learn to write assembly language program, is to first study a simple assembly

    written program. We shall in this section do just the same.

    A Sample Program;ABSTRACT : This program adds 2 8-bit words in the memory

    ; : locations called NUM1 and NUM2. The result is

    ; : stored in the memory location called RESULT. If; : there was a carry from the addition it will be stored

    ; : as 0000 0001 in the location CARRY

    ;ALGORITHM:

    ; get NUM l; Add NUM2

    ; put sum into memory at SUM; position carry in LSB of byte registers

    ; mask off upper seven bits; Store the result in the carry location.

    ;;PORTS :None used

    ;PROCEDURES :None used;REGISTERS : Uses CS, DS, AX

    DATA SEGMENTNUM1 DB 15h ; First number stored here

    NUM2 DB 20h ; Second number stored hereRESULT DB ? ; Put sum here

    CARRY DB ? ; Put any carry here

    DATA ENDSCODE SEGMENT

    ASSUME CS:CODE,

    DS:DATASTART: MOV AX, DATA ; Initialize data segment

    MOV DS, AX ; register

    MOV AI, NUM1 ; Get the first numberADD AI, NUM2 ; Add it to 2nd number

    SULT, AL ; Store the resultRCL AL, 01 ; Rotate carry into LSB

    AND AL, 00000001B ; Mask out all but LSB

    MOV CARRY, AL ; Store the carry resultMOV AX,4C00h

    INT 21hCODE

    ENDSEND START

    The program contains, certain additional mnemonics, in addition to the instructions you have

    studied so far. These are called as assembler directives or pseudo operations. These are thedirections for the assembler. Their meaning is valid only till the assembly time. There is no code

    generated for them.

    SEGMENT and ENDS Directive

  • 7/30/2019 System Software and Languages

    9/55

    The SEGMENT and ENDS directives are used to identify a group of data items or a group ofinstructions, called the segment. These directives are used in the same way as parentheses are

    used in algebra, to group the like items together. A group of data statements or the instructions,that are put in between the SEGMENT and ENDS directives are said to constitute a logical segment.

    This segment is given a name. In our example CODE and DATA are the names given to code anddata segments respectively.

    The segments should have a unique name, there can be no blanks within the segment name, thelength of the segment name can be up to 31 characters. Name of the mnemonics or any other

    reserved words is not allowed as the segment name or label.

    Data Definition DirectivesIn assembly language, we define storage for variables using data definition directives. Data

    definition directives create storage at assembly time, and can even initialize a variable string to a

    starting value. The directives are summarized in the following table:

    Directive Description Number of bytes Attribute

    DB Define byte 1 Byte

    DW Define word 2 word

    DDDefine double-

    word

    4 double word

    DO Define quadword 8 quad word

    DT Define 10 bytes 10 ten bytes

    As we see from the following table, the variable being defined is given an attribute. The attribute

    refers to the basic unit of storage used when the variable was defined. These variables can begiven a name as follows:

    Example

    CHAR_VAR DB 'A'; CHAR_VAR = 41hWORD_VAR DW 01234h; ex number should begin with zero

    LIST DB 1,2,3,4; list of 4 bytes initialized by numbers 1,2,3,4NUM DW 4200

    DEN DB 20

    DUP directive is used to duplicate the basic data definition 'n' number of times. Example:ARRAY DB 10 DUP (0)

    Define an array ARRAY of 10 data bytes, each byte initialized to 0. The initial value can be anythingacceptable to the basic data type.EQU directive is used to define a name to a constant. Example:

    CONS EOU 20

    will define a constant with value 20. Now in your program, where ever you want to use 20, you can

    use the name instead. The advantage of this is that: lets say, you want to change the value ofCONS to, say 10, at some instance of time. Now, instead of making changes every where in the

    program, you just have to change the EQU definition, and assemble the program again. Thechange will be done automatically at all places.

    Types of numbers used in data statements can be octal, binary, hexadecimal, decimal and ASCII.

    Following are the examples of each type:

    TEMP_MAX DB 01101100B ;BInary

    OLD_VAL DW 73410 ;Octal

    DECIMAL DB 49 ;Decimal

    HEX_VAL DW 03B2Ah ;Hex

    ASCII_VAL DB 'EXAMPLE' ;ASCII

  • 7/30/2019 System Software and Languages

    10/55

    The ASSUME Directive8086 has four type of segments, discussed in the previous unit. In the program there can be morethan one code segments, data segments, or extra segments defined. However, only one of each

    type can be active at a time. ASSUME directive is used to tell the assembler, which segment is tobe used as an active segment at any instant, and with respect to which it has to calculate the

    offsets of the variables or instructions. It is usually placed immediately after the SEGMENTdirective, in the code segment, but you can have as many additional Assumes as you like.

    Each time an ASSUME is encountered, the assembler starts to calculate the offset with respect to

    that segment. In the example above CODE and DATA are the two segments defined, one each forcode and data.

    Initializing Segment RegistersASSUME is only a directive, which is used to calculate the offset of variables, instructions or stackelement, with respect to a specific segment of its type. It does not initialize the segment registers.

    Initialization of the segment registers has to be done explicitly using MOV instructions as follows:MOV AX,DATA

    MOV DS,AX

    The above statements are used to initialize the data segment register. The segment registerscannot be directly loaded with memory variable, therefore, the segment name is first moved into

    some general purpose register, which then is moved into the segment register. All segmentregisters can be initialized in the same manner. Code segment register, is initialized automatically

    by the loader.

    END DirectiveThe END directive tells the assembler to stop reading and assembling the program from there on.Any statement after the END will be ignored by the assembler. There can be only one END in the

    program, which is the last statement of the program.

    THE ASSEMBLY LANGUAGE PROGRAMSThe assembly language programs can be written in two ways: one in which all code and data is

    written as part of one segment, called COM programs, and the other where you have more than

    one segment, called the EXE programs. We shall . study each of them in brief, looking at theiradvantages and disadvantages.

    COM ProgramsA COM (Command) program is simply a binary image of a machine language program. It is loaded

    in the memory at the lowest available segment address. The program code begins at offset 100h,the first 1K being occupied by the interrupt vector table, discussed in the earlier section. All

    segment registers are set to the base segment address of the program.

    A COM program keeps, its code, data, and stack within the same segment. Thus, its total size

    should not exceed 64K bytes. A COM program sample is shown. The program's only segment(CSEG) must be declared explicitly using segment directives.

    ;TITLE ADD TWO NUMBERS AND STORE THE CARRY IN A THIRD; VARIABLECSEG SEGMENTASSUME CS:CSEG, DS:CSEG, SS:CSEGORG 100hSTART:MOV AX, CSEG ; Initialize data segment

    MOV DS, AX ; register

    MOV AL, NUM1 ; Get the first numberADD AL, NUM2 ; Add it to. 2nd number

  • 7/30/2019 System Software and Languages

    11/55

    MOV RESULT, AL ; Store the resultRCL AL, 01 ; Rotate carry into LSB

    AND AL, 00000001B ; Mask out all but LSBMOV CARRY, AL ; Store the carry result

    MOV AY,4C00hINT 21h

    NUM1 DB 15h ; First number stored hereNUM2 DB 20h ; Second number stored here

    RESULT DB ? ; Put sum here

    CARRY DB ? ; Put any carry hereCSEG ENDSENDSTART

    The ORG directive sets the location counter at offset 100h before generating any instruction. A

    COM program takes up less space on disk, as compared to the EXE program. In spite of this itallocates all available RAM when loaded. COM programs require at least one full segment, because

    they automatically place their stack at the end of the segment.

    EXE ProgramsAn EXE program is stored on disk with extension EXE. EXE programs are longer than the COMprograms, because with each EXE program is associated an EXE header followed by a load module

    containing the program itself The EXE header, is of fixed 256 bytes, and contains information,which is used by DOS to correctly calculate the address of segments and other components. We

    will not go into the details of these.

    The load module consists of separate segments, which may be thought of as reserved area forinstructions, variables and stack. The EXE program may contain up to 64K segments, although at

    the most only four segments may be active at any time. The segments may be of variable size,with maximum being 64K bytes.

    Advantages Of exe programs are :

    EXE programs are better suited to debugging.EXE-format assembler programs are more easily converted into subroutines for high-level

    languages.

    The third reason has to do with memory management. EXE programs are more easily relocatable,because, there is no ORG statement, forcing the program to be loaded from a specific address. Also

    to fully use multitasking operating system, programs must be able to share computer memory andresources. An EXE program is easily able to do this.

    ASSEMBLER / MACRO PROCESSOR

    INTRODUCTIONComputers have changed a lot since the days when people used to communicate with them by onand off switches denoting primitive instructions. With present day computers interaction has

    become more user-friendly because of the advancement in hardware and software tools. One

    category of software which assist in the mechanics of software development is system software.

    Assembler, linker/loader, compiler, operating system all belong to the realm of system software.

    We discussed several components of programming languages, basic definitions of Assembler,

    Compiler, interpreters and differences among them. In this unit our focus will be on the

    implementation and use of assemblers. We will also cover broadly the use of macro processor,loaders and linkers. This unit is organized as follows:

    ASSEMBLERAssembler ImplementationAn assembly is a program that accepts as input, an assembly language program and produces its

    machine language equivalent along with information for the loader (Figure 1).

  • 7/30/2019 System Software and Languages

    12/55

    Fig. 1: Assembler

    For example, the externally defined symbols (library program) must be indicated to the loader theassembler does not know the address of these symbols and it is up to the loader to find the

    programs containing them, load them into memory and place the values of these symbols in thecalling program. Here we will discuss the different approaches to design of an assembler and its

    related program. Assembler and its related Program The assembler-language program containsthree kinds of entities. Absolute entities include operation codes, numeric and string constants

    and fixed addresses. The values of absolute entities are independent of which storage locations

    the resulting machine code will eventually occupy.

    Relative entities include the addresses of instructions and of working storage. These are fixed only

    with respect to each other, and are normally staled relative to the address of the beginning of themodule. An externally defined entity is used within a module but not defined within it Absolute or

    relative is not necessarily known at the time the module is translated.

    The object program includes identification of which addresses are relative. which symbols aredefined externally, and which internally defined symbols are expected to be referenced externally.

    In the modules in which the latter are used. they are considered to be externally defined. Theseexternal references are resolved for two or more object programs by a linker. The linker accepts

    the several object program as input and produces a single program ready for loading, hencetermed a load program.

    The module is free of external references and consists essentially of machine-language code

    accompanied by a specification of which addresses are relative. When the actual main storagelocations to be occupied by the program become known, a relocating loader reads the program

    into storage and adjusts the relative addresses to refer to those actual locations. The output fromthe loader is a machine-language program ready for execution. The overall process is depicted in

    Figure 3. If only a single source-language module containing no external references is translated,it can be loaded directly without intervention by the linker. In some programming systems the

    format of linker output is sufficiently compatible with that of its input to permit the linking of apreviously produced load module with some new object modules.

    The functions of linking and loading are sometimes both effected by a single program, called a

    linking loader. Despite the convenience of combining the linking and loading functions, it isimportant to realize that they are distinct functions, each of which can be performed

    independently of the other.

  • 7/30/2019 System Software and Languages

    13/55

    Fig. 3 : Program Translation

    LOAD AND GO ASSEMBLERThe simplest assembler program is the load and go assembler. It accepts as input a programwhose instructions are essentially one to one correspondence with those of machine language but

    with symbolic names used for operators and operands. It produces machine language as outputwhich are loaded directly in main memory and gets executed. The translation is usually performed

    in a single pass over the input program text. The resulting machine language program occupiesstorage locations which are fixed at the time of translation and cannot be changed subsequently.

    The program can call library subroutines, provided that they occupy other locations than thoserequired by the program. No provision is made for combining separate subprograms translated in

    this manner. The load and go assembler forgoes the advantages of modular programdevelopment. Among the most of these are

    (1) the ability to design code and test different program components in parallel.

    (2) change in one particular module does not require scanning the rest of program. Mostassemblers are therefore designed to satisfy the desire to create programs in modules. These

    module assemblers. generally are developed in a two-pass translation. During the first pass theassembler examines the assembler-language program and collects the symbolic names into a

    table. During the second pass, the assembler generates code which is not quite in machinelanguage. It is rather in a similar form, sometimes called "relocatable code" and here called object

    code. The program module in object-code form is typically called an object module.

    ONE-PASS MODULE ASSEMBLERThe translation performed by an assembler is essentially a collection of substitutions: machine

    operation code for mnemonic, machine address for symbolic, machine encoding of a number forits character representation, etc. Except for one factor, these substitutions could all be performed

    in one sequential pass over the source text. That factor is the forward reference (reference to an

    instruction which has not yet been scanned by an assembler). The separate passes of the twopass assemblers are required to handle forward references without restriction. If certain

    limitations are imposed, however, it becomes possible to handle forward references withoutmaking two passes. Different sets of restrictions lead to the one pass assembler. These one- pass

    assemblers are particularly attractive when secondary storage is either slow or missing entirely,as on many small machines.

    TWO PASS ASSEMBLERMostly assembler are designed in two passes stages), therefore, they are called Two-Pass

    Assemblers. 'Re pass-wise grouping of tasks in a two pass assembler is given below:

    Pass I

  • 7/30/2019 System Software and Languages

    14/55

    Separate the symbols, mnemonic op-code and operational fields.Determine the storage requirement for every assembly language statement and up date the

    location counter.Build the symbol table. (Table that is used to store each label and its corresponding value).

    Pass IIGenerate object code.

    FUNCTIONThe program of figure 4, although, written in a hypothetical assembler language, contains the

    basic elements which need to be translated into machine language. (It is not essential forstudents to understand the meaning of each statement of the program.) For ease of reference,

    each instruction is defined by a line number, which is not part of the program. Each instruction inour language contains either an operation specification (lines 1- 15) or a storage specification

    (lines 16- 21). An operation specification is a symbolic operation code, which may be preceded bya label and must be followed by 0, 1, or two operand specifications, as appropriate to the

    operation. A storage specification is a symbolic instruction to the assembler. In our assemblerlanguage, it must be preceded by a label and must be followed, if appropriate, by a constant

    FIXED. Labels and operand specifications are symbolic addresses; every operand specificationmust appear somewhere in the program as a label.

    Line Label Operation Operand 1 Operand 2

    1 COPY ZERO OLDER

    2 COPY ONE OLD

    3 READ LIMIT

    4 WRITE OLD

    5 FRONT LOAD OLDER

    6 ADD OLD

    7 STORE NEW

    8 SUBST LIMIT

    9 BRPOS FINAL 10 WRITE NEW

    11 COPY OLD OLDER

    12 COPY NEW OLD

    13 JMP FRONT

    14 FINAL WRITE LIMIT

    15 STOP

    16 ZERO CONST 0

    17 ONE CONST

    18 OLDER SPACE

    19 OLD SPACE

    20 NEW SPACE

    21 LIMIT SPACE

    figure 4 : Sample Assembler-Language Program

    Operation Code No of

    Symbolic Machine Length Operands Action

  • 7/30/2019 System Software and Languages

    15/55

    ADD 02 2 1 ACC - ACC + OPDI

    JMP 00 2 1 Jump to OPDI

    JMPNEG 05 2 1Jump to OPDI if ACC

    0

    JMPZERO 04 2 1Jump to OPDI IF ACC =

    0

    COPY 13 3 2 PD2 - OPDI

    DIVIDE 10 2 1 ACC - ACC / OPDI

    LOAD 03 2 1 ACC - OPDI

    MULT 14 2 1 ACC -ACC X OPDI

    READ 12 2 1 OPDI - input stream

    STOP 11 1 0 Stop execution

    STORE 07 2 1 OPDI - ACC

    SUB 06 2 1 ACC - ACC -OPDI

    WRITE 08 2 1 Output stream - OPDI

    figure 5 : Instruction SetOur hypothetical machine has a single accumulator and a main storage of unspecified size. Its 14

    instructions are listed in Figure 6. Ale first column shows the operation code and the second givesthe machine-language equivalent (in decimal). The fourth column specifies the number of

    operands, and the last column describes the action which ensues when the instruction isexecuted. In that column "ACC", "OPDI", and "OPD2" refer to contents of the accumulator, of the

    first operand location, and of the second operand location, respectively. The length of eachinstruction in words is, 1 greater than the number of its operands.

    Thus if the machine has 12 bit words, an ADD instruction is 2 words of 24 bits, long. The table'sthird column, which is redundant, gives the instruction length. If our hypothetical computer had a

    fixed instruction length, the third and fourth columns could both he omitted.

    The storage specification SPACE reserves one word of storage which presumably will eventually

    hold a number; there is no operand. lie storage specification FIXED also reserves a word ofstorage; it has an operand which is the value of a number to be placed in that word by the

    assembler.

    The instructions of the program are presented in four fields, and might indeed be, constrained so

    such a format on the input medium. The label, if present, occupies the first field. The second fieldcontains the symbolic operation code or storage specification which will hence- forth be referred

    to simply as the operation. The third and fourth fields hold the operand specification, or simplyoperands, if present.

    Although, it is not at all important to our discussion to understand what the example program

    does, the foregoing specifications of the machine and of its assembler language reveal thealgorithm. The program simply, computes the so-called Fibonacci numbers (0,1,1,2,3,5,8,...).

    This program is also written in BASIC programming language of Unit 1 Course 2. Now that wehave seen the elements of an assembler-language program we can ask what functions the

    assembler must perform in translating it Here is the listReplace symbolic addresses by numeric addresses.

    Replace symbolic operation codes by machine operation codes.Reserve storage for instructions and data.

    Translate constants into machine representation.The assignment of numeric addresses can be performed without prior knowledge of what actual

    locations will eventually be occupied by the assembled program. It is necessary only to generateaddresses relative to the start of the program. We shall assume that our assemble normally

    assigns addresses starting at 0. In translating line 1 of our example program, the resultingmachine instruction will therefore be assigned address 1 and occupy 3 words, because COPY

  • 7/30/2019 System Software and Languages

    16/55

    instructions are 3 words long. Hence the instruction corresponding to line 2 will be assignedaddress 3, the READ instruction will be assigned address 6, and the WRITE instruction of line 4

    will be assigned address 8, and so on to the end of the program. But what addresses will beassigned to the operands named ZERO and OLDER? These addresses must be inserted in the

    machine-language representation of the first instruction.

    IMPLEMENTATIONThe assembler uses a counter to keep track of machine- language addresses. Because theseaddresses will ultimately specify locations in main storage, the counter is called the location

    counter. Before assembly, the location counter is initialized to zero. After each source line hasbeen examined on the first pass, the location counter is incremental by the length of themachine-language code which will ultimately be generated to correspond to that source line.

    When the assembler first encounters line 1 of the example program, it cannot replace thesymbols ZERO and OLDER by addresses because those symbols make forward references to

    source language program lines not yet reached by the assembler. The most straightforward wayto cope with the problem of forward references is to examine the entire program, text once,

    before attempting to complete the translation. During that examination, the assemblerdetermines the address which corresponds to each symbol, and places both the symbols and their

    addresses in a symbol table. This is possible because each symbol used in an operand field mustalso appear as a label. The address corresponding to a label is just the dress of the symbol table

    requires one pass over the source text. During a second pass, the assembler uses the addressescollected in the symbol table to perform the translation.

    As such symbolic address is encountered in the second pass, the corresponding numeric address

    is substituted for it in the object code. Two of the most common logical errors in assembler-language programming involve improper use of symbols. If a symbol appears in the operand field

    of some instruction, but nowhere in a label field. it is undefined. If a symbol appears in the labelfields of more than one instruction, it is multiply defined.

    In building the symbol table on the first pass, the assembler must examine the label field of eachinstruction to permit it to associate the location counter value with each symbol. Multiply-defined

    symbols will be found on this pass. Undefined symbols, on the other hand, will not be found onthe first pass unless the assembler also examines operand fields for symbols. Although this

    examination is not required for construction of the symbol table, normal practice is to perform it

    anyhow, because of its value in early detection of program errors. There are many ways toorganize a symbol table. The organisation of a symbol table will not be discussed in this Unit.

    The state of processing after fine 3 is shown in Figure 7. During processing of line 1, the symbols

    ZERO and OLDER were encountered and entered into the fiat two positions of the symbol table,

    The operation COPY was identified. and instruction length, information from figure 6 used toadvance the location counter from 0 to 3. During processing of line 2 two more symbols were

    encountered and entered in the symbol table and the location counter was advanced from 3 to 6.Line 3 yielded the fifth symbol, LIMIT, and caused incrimination of the location counter from 6 to

    8. At this point the symbol table holds five symbols, none of which yet has an address. Thelocation counter holds the address 8, and processing ready to continue from line 4. Neither the

    line numbers nor the addresses shown in part (a) of the figure are actually part of the source-language program. The addresses record the history of incrimination of the location counter the

    line numbers permit easy reference. Clearly, the assembler needs not only a location counter, butalso a line counter to keep track of which source line is being processed.

    Line Address Label Operation Operand 1 Operand 2

    1 0 COPY ZERO OLDER

    2 3 COPY ONE OLD

    3 6 READ LIMIT

    (a) Source text scanned

  • 7/30/2019 System Software and Languages

    17/55

    Symbol Address

    ZERO --

    OLDER --

    ONE --

    OLD -- Location counter ; 8

    LIMIT -- Line counter ; 4

    (b) Symbol table: Countersfigure 6 : First Pass After Scanning Line 3During processing of line 4 the symbol OLD is encountered for the second time. Because it isalready in the symbol table, it is not entered again. During processing of line 5, the symbol

    FRONT is encountered in ft label field. It is entered into the symbol table, and the current locationcounter value, 10 is entered with it as its address. Figure 7 displays the state of the translation

    after line 9 has been processed.

    Line Address Label Operation Operand 1 Operand 2

    1 0 COPY ZERO OLDER

    2 3 COPY ONE OLD

    3 6 READ LIMIT 4 8 WRITE OLD

    5 10 FRONT LOAD OLDER

    6 12 ADD OLD

    7 14 STORE NEW

    8 16 ADD OLD

    9 18 JWPOS FINAL

    10 20 WRITE NEW

    11 22 COPY OLD OLDER

    12 25 COPY NEW OLD

    13 28 JMP FRONT

    14 30 FINAL WRITE LIMIT

    15 32 STOP

    16 33 ZERO CONST 0

    17 34 ONE CONST 1

    18 35 OLDER SPACE

    19 36 OLD SPACE

    20 37 NEW SPACE 21 38 LIMIT SPACE

    (a) Source text scanned

    Symbol Address

    ZERO 33

    OLDER 35

    ONE 34

    OLD 36

  • 7/30/2019 System Software and Languages

    18/55

    LIMIT 38

    FRONT 10 Location Counter : 39

    NEW 37

    FINAL 30 Line Counter .. 22

    (b) Symbol table: Counters

    Figure : 7The XX can be thought of as a specification to the loader will eventually process the object code,

    that the content of the location corresponding to address 35 does not need to have any specificvalue loaded. The loader can then just skip over that location. Some assemblers specify anyway a

    particular value for reserved storage locations, often zeros. There is no logical requirement to doso, however, and the user unfamiliar with his assembler is ill-advised to count on a particular

    value.

    Address Length Machine Code

    00 3 13 33 35

    03 3 13 34 36

    06 2 12 38

    08 2 08 36

    10 2 03 3512 2 02 36

    14 2 07 37

    16 2 06 38

    18 2 01 30

    20 2 08 37

    22 3 13 36 35

    25 3 13 36 35

    28 2 00 10

    32 1 11

    33 1 0034 1 01

    35 1 XX

    36 1 XX

    37 1 XX

    XX

    38 1

    Figure 8 : Object Code Generated on 2nd Pass

    The specifications CONST and SPACE do not correspond to machine instructions. They are really

    instructions to the assembler program. Because of this, we shall refer to them as assemblerinstructions. Another common designation for them is pseudo-instructions. Neither term is really

    satisfactory. Of the two types of assembler instructions in our example program, one results inthe generation of machine code and the other in the reservation of storage. Later we shall see

    assembler instructions which result in neither of these actions. One organization is to use aseparate table which is usually searched before the operation code table is searched. Another is to

    include both machine operations and assembler instructions in the same table. A field in the table

    entry then identifies the types to the assembler.

    A few variations to the foregoing process can be considered. Some of the translation can actuallybe performed during the first pass. Operation fields must be examined during the first pass to

    determine their effect on the location counter. The second pass table lookup to determine the

  • 7/30/2019 System Software and Languages

    19/55

    machine operation code can be obviated at he cost of producing intermediate test which holdsmachine operation code and instruction length in addition to source text.

    Another translation which can be performed during the first pass is that of constants, e.g. fromsource- language decimal to machine-language binary. The translation of any symbolic addresses

    which refer backward in the text, rather than forward, could be performed on the first pass, but itis more convenient to wait for the second pass and treat all symbolic addresses uniformly.

    A minor variation is to assemble addresses relative to a starting address other than 0. Thelocation counter is merely initialized to the desired address. If, for example, the value 200 is

    chosen, the symbol table would appear as in figure 11.The object code corresponding to line 1

    wouldbe200 3 13 233 235.

    Symbol Address

    ZERO 233

    OLDER 235

    ONE 234

    OLD 236

    LIMIT 238

    FRONT 210

    NEW 237

    FINAL 230

    figure 9 : Symbol Table with Starting Location 200If it were known at assembly time that the program is to reside at location 200 for execution then

    full object code with address and length need not be generated. The machine code alone wouldsuffice. In this event the result of translation would be the following 39-word sequence.

    13 233 235 13 234 236 12 238 08

    236 03 235 02 236 07 237 06 238

    01 230 08 238 13 236 235 13 237

    236 00 210 08 238 11 00 01 XX

    XX XX XX

    MACRO PROCESSORThe assembly language programmer often finds it necessary to repeat some statements or block of

    code several times in a program. The block may consist of code to swap sets of registers, do somearithmetic operations. In this situation the programmer find a macro instruction facility useful.

    Macro instruction (often called macros) are single line abbreviation for group of instructions. Inemploying a macro, the programmer essentially defines a single instruction to represent a block of

    code. For every occurrence of this one-line macro instruction in his program, the macro processingassembler substitute the entire block.

    Macro Definition and UsageTo highlight salient aspects of macro-processor. The example is very similar to Intel's 8 bit

    microprocessor assembly language instruction.Example : -

    MACRO

    INCRMT &A , &B

    LOAD &A Macro

    ADD &B Definition

    STORE &A

  • 7/30/2019 System Software and Languages

    20/55

    ENDM

    INCRMT X,Y LOAD X Macro

    ADD Y expansion

    STORE X

    ENDM Macro Program

    Figure 10

    A macro definition is placed at the start of a program, enclosed between the statements MACRO

    and ENDM. A MACRO statement indicates that a macro definition starts, while statement ENDMindicates the end of a macro definition. Thus, a group of statements starting with MACRO andending with ENDM constitutes one macro definition unit. If many macros are to be defined in a

    program, as many definition modules will exist at the start of the program. Each definition modulecontains a new operation and defines it to consist of a sequence of assembly language statement

    In example above, INCRMT is defined to be the name of the LOAD-ADD-STORE instructionsequence. The operation defined by a macro can be used by writing the macro name in the

    mnemonic field and its operands in the operand field of an assembly statement Appearance of amacro name in the mnemonic field amounts to a call on the macro. The assembler replaces such a

    statement by the statement sequence comprising the macro. This is known as macro expansion.

    INCRMTX,Y

    is shown to lead to insertion of the assembly statementsLOAD XADD Y

    STORE Xin its place. All macro calls in a program are expanded in this fashion.

    DEFINING A MACROLet us take another look at the macro definition unit appearing in the following Figure 10.The

    macro header statement indicates the existence of a macro definition unit Absence of the headerstatement as the first statement of a program or ft first statement following a macro definition unit,

    signals the start of the main assembly language program. The next statement in the definition unitis die prototype for a macro call. This statement names the macro and indicates how the operands

    in any call on the macro would be written.The prototype is followed by the so called model statements. These are assembly statements which

    will replace the macro call as a result of macro expansion.

    Positional ParametersThe prototype statement indicates how operands in a macro call would be written. These operandsare called parameters or arguments. All parameters used in the prototype statement have names

    starting with the special character '&'. These parameters are known as formal parameters. A macrocall is written using parameter names which do not start with ft special character '&'. These are

    known as actual parameters.The lists of formal and actual parameters also called as formal and actual parameter lists specified

    in the prototype and macro call statements respectively, establish a correspondence between eachformal parameter and an actual parameter. In figure 10 , this correspondence is determined by the

    relative positions of these parameters in their respective lists. Thus the first actual parameter inthe fist is paired with the first of formal parameters etc.

    Considering the prototype and macro call statements once again.INCRMT &A,&B ... prototype

    INCRMT X,Y ... macro callWe see that X would be paired with &A and Y with &B. While expanding a macro call, any formal

    parameter appearing within a model statement is replaced by the corresponding actual parameter.

    This is how expansion of the call INCR X,Y heads to the following statements

    LOAD X

    ADD YSTORE X

  • 7/30/2019 System Software and Languages

    21/55

    Schematics for Macro-Expansion

    Above we touched upon the fundamental aspects of macro expansion. From the discussion, itappears that the process of macro expansion is similar to language translation. The source

    program containing macro definitions and calls is translated into an assembly language, programwithout any macro definitions or calls. This program form can now be handed over to a

    conventional assembler as to obtain the target languages form of the program.

    In such a schematic (Figure 11), the process of macro expansion is completely segregated from the

    process of assembly program. The translator which performs macro expansion in this manner iscalled a macro pre-processor. The advantage of this scheme is that any existing conventionalassembler can be enhanced in this manner to incorporate macro processing. It would reduce the

    programming cost involved in making a macro facility available to programmer using a computersystem. The disadvantage is that this scheme is probably not very efficient because of the time

    spent in generating assembly language statements and processing them again for the purpose oftranslation to the target language.

    Fig. 12 : A pre-processor based scheme for macro assembly

    ISSUES RELATED TO THE DESIGN OF A MACRO PRE-PROCESSORAs against this schematic of prefixing a conventional assembler with a macro pre-processor, it is

    possible to design a macro assembler which not only processes macro definitions and macro callsfor the purpose of expansion, but also assembles the expanded statements along with the original

    assembly statements. The macro assembler should require fewer passes over the program thanthe pre-processor scheme. This holds out a promise for better efficiency. But for the sake of

    simplicity in this section, we will discuss the issues related to implementation of macro pre-processor instead of actual implementation.

    Issues related to the Design of a Macro Pre ProcessorOur discussion regarding the definition and use of macros in an assembly program has brought out

    to some extent the working principles of a macro pre-processor. To summarise, we should be ableto differentiate between macro names and invalid operation code mnemonics. On thus recognizing

    a call on a macro, we should be able to access the text of its definition so that we can expand thecall. For generating a statement during expansion, we need to develop a simple scheme for

    substituting the appearance of a formal parameter with its value. Correspondence between aformal parameter and its value will have to be established for this purpose. It is desirable that

    instead of performing this action for every appearance of a formal parameter, correspondent

    between formal parameters and their value should be established once and for all, at the start ofmacro expansion.Considerations of positional and keyword correspondence would thus get localized to the start of

    macro expansion only. This would have the further advantage that no distinction would need to bemade between keyword and positional parameters during macro expansion.

    Step 1:

    Scan all macro definitions one by one. For each macro defined.enter its name in the Macro Name Table (MNT).

    store the entire macro definition in the Macro Definition Table (MDT).add auxiliary information to the MNT indicating where the definition of a macro can be found in

    MDT.

  • 7/30/2019 System Software and Languages

    22/55

    Step 2:Examine all statements in the assembly source program to detect macro calls. For each macro call

    locate the macro in MNT.obtain information from MNT regarding position of the macro definition in MDT.

    process the macro call statement to establish correspondence between all formal parameters andtheir values (i.e. actual parameters).

    expand the macro call by following the procedure given in step 3.Step 3:

    Process the statements in the macro definition as found in MDT in their expansion time order until

    the ENDM statement is encountered. The conditional assembly statement AIF and AGO will enforcechanges in the normal sequential order based on certain expansion time relations between valuesof formal parameters and expansion time variables.

    In order to have a complete working scheme within the above framework, we need to finalise thefollowing details:

    Method of establishing correspondence between a formal parameter and its value.Method of sequencing through the statements comprising a macro definition in expansion time

    order.Method of expanding a model statement

    Allocation of storage for expansion time variables and access to their values during expansion.

    COMPILER/ LINKER LOADER

    LOADERSINTRODUCTIONThe purpose of this section is to discuss various functions of a loader. The loader is aprogram which accepts an object code and prepare them for an execution. An object code

    produced by an assembler/compiler cannot be executed without any modification. As manyas four more function must be performed first. These functions are performed by a loader.

    These functions are:Allocation of space in main memory for the programs.

    Linking of a program with each other like library programsAdjust all address dependent locations. such as address constants, to correspond to the

    allocated space. it is also called relocationPhysically load the machine instructions and data into memory. The following figure 1 shows

    the function of a loader

    Fig. 1: Function of a loader.

    Let us examine the need of some of these function of the loader.

    Linking

  • 7/30/2019 System Software and Languages

    23/55

    The need for linking a program with other programs arises because a program written by aprogrammer or its translated version is rarely of a 'stand-alone' nature. That is a program

    generally cannot execute on its own. without requiring the presence of some otherprograms in the computer's memory.

    For example. consider a program written in high level languages like C. Such a program

    may contain calls on certain Input/Output functions like Printf ( ), Scanf ( ) etc., which amnot written by the programmer himself. During program execution, those standard functions

    must reside into the main memory. Furthermore, every time an Input/Output function is

    called by a C language program, control should get transferred to the appropriate function.The linking function makes address of programs known to each other so that such transferscan take place during the execution.

    RELOCATIONAnother function commonly performed by a loader is that of program relocation. This

    function can be explained as follows: Assume that a program written in C ( let us call it A)calls standard function Printf ( ). A and Printf ( ) would have to be linked with each other.

    But where is main storage shall we load A and Printf ( ). A possible solution would be to loadthem according to the addresses assigned when they were U~W& For example, as

    translated . A might be given stone area from 200 to 300 while Printf ( )function occupiesarea from 100 to 150.

    If we were to load these programs at their translated addresses, a lot of storage lying

    between them may go waste. Another possibility is that both A and Printf ( ) may have beentranslated with the identical start address of 100. 7bus, A extends from 100 to 200 while

    Printf ( ) extends from 100 to 1 50. But there is simply no way A and Printf ( )can co-existat same storage location. Therefore, the loader may have to relocate one or both of these

    programs to avoid address conflicts or storage waste. It should be noted that relocation ismore than simply moving a program from one area to another in the storage. It refers to

    adjustment of address fields and not to movement of a program.

    The task of relocation is to add some constant value to each relative address in the segment(the segment is a unit of information dust is treated as an entity, be it a program or data. It

    is possible to produce multiple program or data segment in a single source file). The pan of

    a loader which performs relocation is called relocating loader.

    LOADER SCHEMESThere, are several schemes accomplishing the four loading function. These schemes are (i)

    Absolute loader (ii) Relocating Loader (iii) Direct Linking Loader (iv) Dynamic Loading (v)Dynamic Linking etc.

    Absolute Loader : The task of an absolute loader is virtually trivial. The loader simplyaccepts the machine language code produced by the assembler and places it into main

    memory at the location specified by the assembler.

    Relocating Loader: To avoid possible reassembling of all subroutines when a singlesub-routine is changed and to perform the tasks of allocation and linking for theprogrammer. The general class of relocating loader was introduced.The output of a relocating loader is the object program and information about all other

    programs it references. In addition, there is information (relocation information) as tolocation in this program that need to be changed if it is to be loaded in an arbitrary location

    in memory.

    Direct Linking Loader: It is a general relocatable loader, and is perhaps the mostpopular loading scheme presently used. It has the advantage of allowing the programmermultiple procedure segments and multiple data segments and of giving him complete

    freedom in referencing data or instructions contained in other segments. This provides

  • 7/30/2019 System Software and Languages

    24/55

    flexible inter segment referencing and accessing ability, while at the same time allowingindependent translations of programs. The other two loader schemes will be discussed in

    the next section.

    Dynamic Loading And Linking: There are numerous variations to the previouslypresented loader schemes. One disadvantage of the direct-linking loader, as presented, is

    that it is necessary to allocate, relocate, link. And load all of the subroutines each time inorder to execute a program. Since there may be tens and often hundreds of subroutines

    involved, especially when we include utility routines such as SQRT etc., this loading process

    can be extremely time- consuming.

    Furthermore, even though the loader program may be smaller than the assembler, it doesabsorb a considerable amount of space. These problems can be solved by dividing the

    loading process into two separate programs: a binder and a module loader. A binder is aprogram that performs the same functions as the direct-linking loader in binding

    subroutines together, but rather Cm placing the relocated and linked text directly intomemory, it outputs the text as a file. This output file is in a format ready to be loaded and is

    typically called a load module. The module loader merely has to physically load the moduleinto main memory. The binder essentially performs the functions of allocation, relocation,

    and linking; the module loader merely performs the function of loading. There are twomajor classes of binders. The simplest type produces a load module that looks very much

    like a single absolute loader filet This means that the specific memory allocation of theprogram is performed at the time that the subroutines are bound together. A more

    sophisticated binder, called a linkage editor. can keep auk of the relocation information sothat the resulting load module can be further relocated and thereby loaded anywhere, in

    memory. In this case the module loader must perform additional allocation and relocation aswell as loading, but it does not have to worry about the complex problems of linking.

    In both cases, a program that is to be used repeatedly need only be bound once and then

    can be loaded whenever required. The first binder is relatively simple and fast. The secondone (linkage editor binder) is somewhat more complex but allows a more flexible allocation

    and loading scheme.

    Dynamic LoadingIn each of the previous loader schemes we have assumed that all of the subroutines neededare loaded into main memory at the same time. If the total amount of memory required by

    all these subroutines exceeds the amount available, as is common with large programs onsmall computers, there is trouble! There are several hardware, techniques, such as paging

    and segmentation, that attempt to solve this problem.

    Usually the subroutines of a program are needed at different times: for example, pass 1 andpass 2 of an assembler are mutually exclusive ~ 1 and pass 2 should not simultaneously

    occupy memory resources). By explicitly recognizing which subroutines call othersubroutines it is possible to produce an overlay structure that identifies mutually exclusive

    subroutines.

    Figure 2 illustrates a program consisting of five subprograms (A, B. C, D and E) that require100K bytes of memory. The arrows indicate that subprogram A only calls B, D and E;subprogram B only calls C and E; subprogram D only calls E; and subprograms C and E do

    not call any other routines. Figure 16(a) highlights that interdependencies between theprocedures. Note that procedures B and D are never in use at the same time; neither are C

    and E. If we load only those procedures that are actually to be used at any particular time.the amount of memory needed is equal to the longest path of the overlay structure.

    This happens to be 7-K for the example in Figure 16(b) procedures A, B and C. Figure 2 (c)

    illustrates a storage assignment for each procedure consistent with the overlay structure.In order for the overlay structure to work it is necessary for the module loader to load the,

    various procedures as they are needed. We will not go into their specific details, but there

  • 7/30/2019 System Software and Languages

    25/55

    are many binders Capable of processing and allocating an overlay structure. The portion ofthe loader that actually intercepts the calls and loads the necessary procedure is called the

    over lay supervisor or simply the flipper. This overall scheme is called dynamic loading orload on-call

    Figure 2 ( A )

    Figure 2 ( B )

    Figure 2 ( C )

  • 7/30/2019 System Software and Languages

    26/55

    Figure 2 ( D )

    Fig. 2 : Dynamic Loading

    DYNAMIC LINKINGThe major disadvantage of all of the previous loading schemes is that if a subroutine is

    referenced but never executed (e.g. if the programmer had placed a call statement in his

    program but this statement was never executed because of a condition did not satisfy) the

    loader would still incur the overhead of linking the subroutine.

    Furthermore, all of these schemes require the programmer to explicitly name all procedures

    that might be called. A very general type of loading scheme is charted dynamic linking. Thisis a mechanism by which loading and linking of external references are postponed until

    execution time. The loader loads only the main program. If the main program shouldexecute a transfer instruction to an external address, or should reference an external

    variable (that is, a variable that has not been defined in this procedure segment), the loaderis called. Only then is the segment containing the external reference loaded. An advantage

    here is that no overhead is incurred unless the procedure to be called or referenced isactually used. A further advantage is that the system can be dynamically reconfigured. The

    major drawback to using this type of loading scheme is the considerable overhead and

    complexity incurred, due to the fact that we have postponed most of the binding processuntil execution time.

    Now we will discuss the implementation of the simplest type of loader scheme which iscalled an absolute loader.

    Implementation of an Absolute LoaderAbsolute loaders are simple to implement but they do have disadvantages. First, theprogrammer must specify to the assembler the address in memory when the program is to

    be loaded. Further, if there are multiple function to be called within a program, the

    programmer must remember the address of each and use that absolute address explicitly inhis Other functions to perform linking of functions. The figure B illustrates the operation of

    an absolute loader. The programmer must he careful not to assign two subroutine functionto the same or overlapping address.

  • 7/30/2019 System Software and Languages

    27/55

    Figure 3 : Absolute LoaderThe program First. c is assigned to locations 100-300 and the sqrt function is assigned

    location 400-450. If changes were made to A that increased its length to more than 300bytes, the end of first. c (at 100+300 = 400) would overlap the start of sqrt (at 400). It

    would then be necessary to assign sqrt to a new address. Furthermore, it would also benecessary to modify all other functions that referred to sqrt. In situation when dozen of

    subroutines are being used, this manual shuffling can get very complex, tedious andwasteful of time and memory.

    The four loader functions are accomplished as follows in an absolute loading scheme:

    MACRO

    INCRMT &A , &B

    LOAD &A Macro

    ADD &B Definition

    STORE &A

    ENDM

    INCRMT X,Y LOAD X Macro

    ADD Y expansion

    STORE X

    ENDM Macro Program

    COMPILERThe study of compiler designing form a central theme in the field of computer science. An

    understanding of the technique used by high level language compilers can give the programmer aset of skills applicable in many aspects of software design - one does not have to be a compiler

    writer to make use of them.

    Assembler which translates assembly language program into machine language. here we will lookat another type of translator called compiler. The compiler writing is not confined to one discipline

    only but rather spans several other disciplines: programming languages, computer architecture,theory of programming languages, algorithms, etc. Today a few basic compiler writing techniques

    can be used to construct translators for a wide variety of languages. This unit is intended as anintroduction to the basic essential features of compiler designing.

    WHAT IS A COMPILER?A compiler is a software (Program) that reads a program written in a source language and

    translates it into an equivalent program in another language - the target language (see figure4).The important aspect of compilation, process is to produce diagnostic (error messages) in the

    source program. These error messages are mainly due to the grammatical mistakes done by aprogrammer. A familiarity with the material covered in this unit will be a great help in

    understanding the inner function of a compiler

  • 7/30/2019 System Software and Languages

    28/55

    Fig. 4 . A Complier

    There are thousands of source languages, ranging from C and PASCAL to specialized languagesthat have arisen in virtually every area of computer application. Target languages a also in

    thousands. A target language may be another programming language or the machine language oran assembly language. Compilers are classified as single pass, multitasks, debugging or optimizing,

    depending on how they have been constructed or on what functions are supposed to perform.Earlier (in 1950's) compilers were considered as a difficult program to write.

    The first FORTRAN compiler, for example, took 18 staff-years to implement B now several newtechniques and tools have been developed for handling many of the important tasks that occur

    during compilation process. Good implementation languages, programming environments (editors,debuggers, etc.) and software tools have also been developed. With these development compiler

    writing exercise has become easier.

    Approaches To Compiler DevelopmentThere are several approaches to compiler developments. Here we will look at some of them are -

    Assembly Language CodingEarly compilers were mostly coded in assembly language. The main consideration was to increase

    efficiency. This approach worked very well for small High Level Languages (HLL). As languagesand their compilers became larger, lots of bugs started surfacing which were difficult to remove.The major difficulty with assembly language implementation was of poor software maintenance.

    Around this time, it was realised that coding the compilers in high level language would overcome

    this disadvantage of poor maintenance. Many compilers were therefore coded in FORTRAN, theonly widely available HLL at that time. For example, FORTRAN H compiler for IBM/360 wascoded in FORTRAN. Later many system programming languages were developed to ensureefficiency of compilers written into HLL.Assembly language is still being used but trend is towards

    compiler implementation through HLL.

    Cross-CompilerA cross-compiler is a compiler which runs on one machine and generates a code for anothermachine. The only difference between a cross-compiler and a normal compiler is in terms of code

    generated by it. For example, consider the problem of implementing a Pascal compiler on a new

    piece of hardware (a computer called X) on which assembly language is the only programminglanguage already available. Under these circumstances, the obvious approach is to write the Pascalcompiler in assembler. Hence, the compiler in this case is a program that takes Pascal source as

    input, produces machine code for the target machine as output and is written in the assemblylanguage of the target machine. The languages characterizing this compiler can be represented as:

  • 7/30/2019 System Software and Languages

    29/55

    figure 5 :showing that Pascal source is translated by a program written in X assembly language (the

    compiler) running on machine X into X's object code. This code can then be run on the targetmachine. This notation is essentially equivalent to the T-diagram. The T-diagram for this compiler

    is shown in figure 5 .

    Fig. 5 T-diagramThe language accepted as input by the compiler is stated on the left the language output by the

    compiler is shown on the right and the language in which the compiler is written is shown at thebottom. The advantage of this particular notation is that several T-diagrams can be meshedtogether to represent more complex compiler implementation methods. This compiler

    implementation involves a great deal of work since a large assembly language program has to be

    written for X. It is to be noticed in this case that the compiler is very machine specific; that is, notonly does it run on X but it also produces machine code suitable for running on X.

    Furthermore, only one computer is involved in the entire implementation process.The use of a high-level language for coding the compiler can offer great savings in implementation

    effort. If the language in which the compiler is being written is already available on the computer inuse, then the process is simple. For example, Pascal might already be available on machine X, thus

    permitting the coding of, say, a Modula-2 compiler in Pascal.

    Such a compiler can be represented as:

  • 7/30/2019 System Software and Languages

    30/55

    If the language in which the compiler is being written is not available on the machine, then all is

    not lost, since it may be possible to make use of an implementation of that language on anothermachine. For example, a Modulc-2 compiler could be implemented in Pascal on machine Y,

    producing object code for machine X:

    The object code for X generated on machine Y would of course have, to be transferred to X for itsexecution. This process of generating code on one machine for execution on another is called

    cross-compilation.

    At first sight, the introduction of a second computer to the compiler implementation plan seems tooffer a somewhat inconvenient solution. Each time a compilation is required, it has to be done on

    machine Y and the object code transferred, perhaps via a slow or laborious mechanism, to machineX for execution. Furthermore, both computes have to be running and inter-linked somehow, for

    this approach to work.

    BOOTSTRAPPINGIt is a concept of developing a compiler for a language by using subsets (small pail) of the samelanguage. Suppose that a Modula-2 compiler is required for machine X, but that the compiler be

    coded in Modula-2. Coding the compiler in the language it is to compile is nothing nothing special

    and, as will be seen, it has a great deal in its favour. Suppose further that Modula-2 is alreadyavailable on machine Y. In this case, the compiler can be run on machine Y, producing object codefor machine X:

    This is the same situation as before except that the compiler is coded in Modula-2 rather thanPascal. The special feature of this approach appears in the next step. The compiler, running on Y, is

    nothing more than a large program written in Modula-2. Its function an input file of Module-2statements into a functionally equivalent sequence of statement in X's machine code.

    Therefore, the source statements of this Module-2 compiler can be passed into itself running on Yto produce a file containing X's, machine code. This file is of course a Module-2 compiler, which is

    capable of being run on X. By making the compiler compile itself, a version of the compiler thatruns on X has been created.

  • 7/30/2019 System Software and Languages

    31/55

    Once this machine code has been transferred to X, a self-sufficient Module-2 compiler is availableon X; hence there is no further use for machine Y for supporting Module-2 compilation.

    This implementation plan is very attractive. Machine Y is only required for compiler development

    and once this development has reached the stage at which the compiler can (correctly) compileitself, machine Y is no longer required. Consequently, the original compiler implemented on Y need

    not be of the highest quality - for example, optimization can be completely disregarded. Furtherdevelopment (and obviously conventional use) of the compiler can then continue at leisure on

    machine X.This approach to compiler implementation is called bootstrapping. Many languages,

    including C, Pascal, FORTRAN and LISP have been implemented in this way.

    Pascal was first implemented by writing a compiler in Pascal itself. This was done through several

    bootstrapping processes. The compiler was then translated "by hand" into an available low level

    language.

    Compiler Designing PhasesThe compiler being a complex program is developed through several phases. Each phasetransforms the source program from one representation to another. The tasks of a compiler can be

    divided very broadly into two sub-tasks.The analysis of a source program

    The synthesis of the object program

    In a typical compiler, the analysis task consists of 3 phases.Lexical analysis

    Syntax analysis

    Semantic analysisThe synthesis task is usually considered as a code generation phase but it can be divided into some

    other distinct phases like intermediate code generation and code optimization. These four phasefunctions in sequence are shown in figure 6 . Code optimization is beyond this unit.

    The nature of the interface between these four phases depends on the compiler. It is perfectlypossible for the four phases to exist as four separate programs.

    Fig. 6 Compiler Design Phases

    Lexical AnalysisLexical analysis is the first phase of a compiler. Lexical analysis, also called scanning, scans a

    source program form left to right character by character and group them into tokens having acollective meaning. It performs two important tasks. First, it scans a source program character bycharacter from left to right and groups them into tokens (or syntactic element). Each token or

    basic syntactic element represents a logically cohesive sequence of characters such as identifier

    (also called variable), a keyword (if, then. else, etc.), a multi -character operator < =, etc. Theoutput of this phase goes to the next phase, i.e., syntax analysis or parsing. The interaction

    between two phases is shown below in figure 7 .

  • 7/30/2019 System Software and Languages

    32/55

    Fig. 7 Interaction between the first two phases

    The second task performed during lexical analysis is to make entry of tokens into a symbol table ifit is not there. Some other tasks performed during lexical analysis are:

    to remove all comments, tabs, blank spaces and machine characters.

    to produce error messages (also called diagnostics) occurred in a source program.Let us consider the following Pascal language statement.

    For i = 1 To 50 do sum = sum + x [i]; sum of numbers stored in array x

    After going through the statement, the lexical analysis transforms it into the sequence of tokens