advanced microprocessors assemblers and loaders

8/13/2019 Advanced Microprocessors Assemblers and Loaders

http://slidepdf.com/reader/full/advanced-microprocessors-assemblers-and-loaders 1/57

www.bookspar.com | Website for Students | VTU NOTES | QUESTION PAPERS | NEWS | RESULTS


• Assembler: is a program that accepts an assembly language program as input and converts it into an objectmodule and prepares for loading the program into memory for execution.

• Loader (linker) further converts the object module prepared by the assembler into executable form, by linking itwith other object modules and library modules.

• The final executable map of the assembly language program is prepared by the loader at the time of loading into

the primary memory for actual execution.• The assembler prepares the relocation and linkages information (subroutine, ISR) for loader.• The operating system that actually has the control of the memory, which is to be allotted to the program for

execution, passes the memory address at which the program is to be loaded for execution and the map of theavailable memory to the loader.

• Based on this information and the information generated by the assembler, the loader generates an executablemap of the program and further physically loads it into the memory and transfers control to for execution.

• Thus the basic task of an assembler is to generate the object module and prepare the loading and linkinginformation.

Procedure for assembling a program

• Assembling a program proceeds statement by statement sequentially. •

The first phase of assembling is to analyze the program to be converted. This phase is called Pass1 defines andrecords the symbols, pseudo operands and directives. It also analyses the segments used by the program typesand labels and their memory requirements.

• The second phase looks for the addresses and data assigned to the labels. It also finds out codes of theinstructions from the instruction machine, code database and the program data.

• It processes the pseudo operands and directives. • It is the task of the assembler designer to select the suitable strings for using them as directives, pseudo

operands or reserved words and decides syntax.

Directives

• Also called as pseudo operations that control the assembly process. • They indicate how an operand or section of a program to be processed by the assembler.

• They generate and store information in the memory.

Assembler Memory models

• Each model defines the way that a program is stored in the memory system. • Tiny: data fits into one segment written in .COM format • Small: has two segments data and memory. • There are several other models too.

Directive for string data in a memory segment

• DB define byte• DW define word

• DD define double word• DQ define 10 bytes

ExampleData1 DB 10H,11H,12HData2 DW 1234H

• SEGMENT: statement to indicate the start of the program and its symbolic name.• Example

Name SEGMENT

http://www.bookspar.com/










Variable_name DB …….Variable_name DW …….

Name ENDS

Data SEGMENTData1 DB …….Data2 DW …….

Data ENDS

Code SEGMENTSTART: MOV AX,BX………

Code ENDS

Similarly the stack segment is also declared.

• For small models.DATA

……ENDS

The ENDS directive indicates the end of the segment.• Memory is reserved for use in the future by using a ? as an operand for DB DW or DD directive. The

assembler sets aside a location and does not initialize it to any specific value (usually stores a zero). The DUP(duplicate) directive creates an array and stores a zero.

• Example

Data1 DB 5 DUP(?)This reserves 5 bytes of memory for a array data1 and initializes each location with 05H

• ALIGN: memory array is stored in word boundaries.• Example

ALIGN 2 means storing from an even address

Address 0 XXAddress 1 YYAddress 2 XX

The data XX is aligned to the even address.• ASSUME, EQU, ORG• ASSUME tells the assembler what names have been chosen for Code, Data Extra and Stack segments. Informs

the assembler that the register CS is to be initialized with the address allotted by the loader to the label CODEand DS is similarly initialized with the address of label DATA.

• ExampleASSUME CS: Name of code segmentASSUME DS: Name of the data segment

ASSUME CS: Code1, DS: Data1

• EQU: Equates a numeric, ASCII(American Standard Code for Information Interchange) or label to anotherlabel.

• ExampleData SEGMENT











Num1 EQU 50H Num2 EQU 66H

Data ENDS

Numeric value 50H and 66H are assigned to Num1 and Num2

•

ORG: Changes the starting offset address of the data in the data segment• Example

ORG 100H100 data1 DB 10Hit can be used for code too.

• PROC & ENDP: indicate the start and end of the procedure. They require a label to indicate the name of the procedure.

• NEAR: the procedure resides in the same code segment. (Local)• FAR: resides at any location in the memory.• Example

Add PROC NEARADD AX,BXMOV CX,AX

RETAdd ENDP

PROC directive stores the contents of the register in the stack.• EXTRN, PUBLIC informs the assembler that the names of procedures and labels declared after this directive

have been already defined in some other assembly language modules.• Example

If you want to call a Factorial procedure of Module1 from Module2 it must be declared asPUBLIC in Module1.

• ExampleA sample for full segment definition

Data SEGMENT Num1 DB 10H Num2 DB 20H Num3 EQU 30H

Data ENDS

ASSUME CS:Code,DS:DataCode SEGMENT

START: MOV AX,DataMOV DS,AXMOV AX,Num1MOV CX,Num2ADD AX,CX

Code ENDS

• ExampleA sample for small model

. MODEL SMALL

.Data











Num1 DB 10H Num2 DB 20H Num3 EQU 30H

.CodeHERE: MOV AX,@Data

MOV DS,AXMOV AX,Num1MOV CX,Num2ADD AX,CX

QUESTIONS

1: What is a directive?2: Describe the purpose of DB DW and DQ directive?3: What is the purpose of .386 directive?4: What does START indicate?

--------------------------------------------------------------------------------------------------------------------------------------------











Addition:

There are two instructions ADD and ADC

Register Addition:

ADD AL,BL AL=AL+BLADD CX,DI CX=CX+DIADD CL,10H CL=CL+10ADD [BX],AL the contents of AL are added with the contents of a memory locationaddressed by BX and the result is stored in the same memory location

Example

ADD AL,BL AL=10H BL=30H the result AL=40HADD AX,[SI+2] the word content of the data segment memory location addressed by sum ofSI+2 is added with AX and the result is stored in AX

Example

AX=1234H SI=2000 SI+2=2002 and let the word stored in memory location 2002 be1122H The result AX=2356HADD BYTE PTR [DI],3 –3 is added to the byte contents of the data segment memorylocation addressed by DI

Example

DI=2000 and the contents of that memory location is 11HThe contents of address 2000 will be 14H after the execution of this instruction

The contents of the flag register change after the addition operation. The flags affected areSIGN,CARRY,ZERO, AUX CARRY,PARITY,OVERFLOWThe INTR,TRAP and other flags not affected.

Immediate Addition

An 8 bit immediate data is added.

Example











MOV AL,10HADD AL,30HThe result AL=40H

Memory to Register addition

Example

MOV AX,0ADD AX,DIADD AX,DI+1Let DI=2000 the contents of this memory location is 22HAfter first add AX will have 22+0=22HThen DI+1=2001 let the contents be 11H

The result will be 33H

Array addition The offset address of the array is moved to the SI or DI register

Example

MOV AL,0MOV SI,OFFSET of ArrayADD AL,[SI]ADD AL,[SI+2]

ADD AL,[SI+4]

ArrayOffsetaddr

2000 10H

2001 11H2002 22H2003 33H2004 44H

After first add the contents AL will be 0+10=10H

After the second add instruction AL will be 10+22=32HAfter the third add instruction AL will be 32+44=76H

Increment addition

INC adds a 1 to a register or a memory location used for memory increments











Example

INC AXThis instruction adds one to the contents ox AX let Ax=1234H the result will be AX=1235HINC BYTE PTR [DI]

This instruction adds one to the byte contents of the data segment location addressed by DI

Addition with carry

ADC adds the bit in carry flag to the operand data.

Example

ADC AL,BH AL=AL+BH+CARRY

ADC CX,AX CX=CX+AX+CARRYADC BX,[BP+2] the word contents of the stack segment memory location addressed byBP+2 is added to BX with carry and the result is stored in BX.

Subtraction

Many forms of subtraction appears to use with any addressing mode 8 16 and 32 bit data

SUB SBB subtract with borrowRegister Subtraction:

SUB AL,BL AL=AL-BL

SUB CL,10H CL=CL-10The carry flag holds the borrow.

Decrement

A 1 is subtracted from the register or the memory location.

Example

DEC AXDEC BYTE PTR [DI]DEC CLDEC BL

Subtracts 1 to from a register or a memory location

CMP











This changes only the flag the destination operand never changesThis instruction is usually followed by conditional jump instructionsand tests the condition against the flags

Multiplication

The multiplication is performed on bytes words or double words and can be a signed integer orunsigned integer MUL: unsignedIMUL: signedFlags CARRY,OVERFLOW

8 Bit multiplication

ExampleMOV BL,05HMOV AL,10H

MUL BLThe multiplicand is in ALThe multiplier is in BL (even a memory location can be used)

8 Bit multiplication

ExampleIMUL BYTE PTR [BX]

AL is multiplied by the byte contents of the data segment memory location addressed by BX thesigned product is placed in AX

For signed multiplication the product is in true binary form if positive and in two’s complement

form if negative

ExampleAL 00000010 BL 10000100AL contains +2 and BL contains -4IMUL BLThe product is -8The product is in two’s complement form stored in AXAX 11111000

Division

DIV,IDIV

The dividend is always a double width dividend that is divided by the operandAn 8 bit division devides a 16 bit number by a 8 bit numberErrors: Divide by zero,devide overflow

AX register stores the dividend that is divided by contents of any 8 bit register or memorylocation.











BCD Arithmetic:

The microprocessor allows manipulation of BCD and ASCII data

BCD used in Cash registers and ASCII used by many programs

There are two instructionsDAA decimal adjust after addition

DAS decimal adjust after subtractionBoth instructions correct the result. The BCD number is stored as packed form 2 digits/byte andif unpacked form means 1 digit/byte it functions with AL only.DAA decimal adjust after addition

The result is in ALThe Logic of this instruction

If lower nibble>9 or AF=1 add 06

After adding 06 if upper nibble>9 or CF=1 add 60DAA instruction follows ADD or ADC

Example1

ADD AL,CLDAA

Let AL=53 and CL=29AL=53+29

AL=7CAL=7C+06 (as C>9)AL=82

Example 2

Let AL=73 CL=29

AL=9CAL=9C+06 (as C>9)AL=A2AL=A2+60=02 and CF=1

The instruction affects AF,CF,PF and ZFExample3

MOV DX,1234H MOV BX,3099HMOV AL,BLADD AL,DLDAAMOV AL,BH

ADC AL,DHDAA











MOV CH,AL

BL=99H DL=34H

99+34=CDAL=CD+6(D>9)

AL=D3AL=D3+60(D>9)AL=33 and CF=1

BH=30 DH=12

AL=30+12+CFAL=43DAA does not do anythingThe result is placed in CX=4333

DAS instruction follows subtraction

The result is in ALLogic of this instruction

If lower nibble>9 or AF=1 subtract 06After subtracting 06 if upper nibble>9 or CF=1 add 60The instruction affects AF,CF,PF and ZF

Example1 SUB AL,BHDASLet AL=75 BH=46

AL=75-46=2F AF=1

AL=2F-6(F>9) AL=29

Example 2

SUB AL,CHDASAL=38 CH=61

AL=38-61=D7 CF=1(borrow)AL=D7-60(D>9)AL=77 CF=1(borrow)

Example 3

MOV DX,1234H

MOV BX,3099HMOV AL,BL











SUB AL,DLDASMOV CL,ALMOV AL,BH

SBB AL,DH

DASMOV CH,ALAL=99-34=65

DAS will not have affectAL=30-12=1EAL=1E-06(E>9)AL=18The result is 1865 placed in CX

ASCII Arithmetic

Functions with ASCII coded numbers

The numbers range from 30-39H for 0-9AAAAADAAMAAS use AX as source and destination

AAA

Exampleadd 31H and 39H the result is 6AH it should have been 10 decimal which is 31H and 30HAAA is used to correct the answer

Converts resulting contents of AL to unpacked decimal digits

AAA instruction examines the lower 4 bits of AL for valid BCD numbers and checks AF=0 setsthe 4 high order bits to 0AH cleared before additionIf lower digit of AL is between 0-9 and AF=1 06 is addedThe upper 4 digits are cleared and incremented by 1If the lower value of the lower nibble is greater than 9 then increment AL by 06 AH by 1

AF and CF setThe higher 4 bits of AL are cleared to 0AH modifiedTo get the exact sum add 3030H to AXAAS Correct result in AL after subtracting two unpacked ASCII operandsThe result is in unpacked decimal formatIf the lower 4 bits of AL are>9 or if AF=1 then AL=AL-6 and AH=AH-1 CF and AF setotherwise CF and AF set to 0 no correction

r esult the upper nibble of AL is 00 and the lower nibble may be any number from 0-9











AAM Follows multiplication instruction after multiplying two unpacked BCD numbersConverts the product available in AL into unpacked BCDLower byte of result is in AL and upper in AH

Example let the product is 5D in ALD>9 so add 6 =13HLSD of 13H is lower unpacked byteIncrement AH, AH=5+1=6 upper unpacked byteAfter execution AH=06 and AL=03MOV AL,5

MOV CL,5MUL CLAAM

Accomplishes conversion by dividing AX by 10

Benefit of AAM –converts from binary to unpacked BCDuse of AAM for conversion

XOR DX,DXMOV CX,100DIV CXAAMADD AX,3030HXCHG AX,DX

AAMADD AX,3030H

AAD Appears before divisionrequires AX to contain two digit unpacked BCD number(not ASCII) before executingAfter adjusting AX with AAD it is divided by an unpacked BCD number to generate a singledigit result in AL with remainder in AHExample.MODEL

.CODE

.STARTUPMOV AL,48HMOV AH,0

AAMADD AX,3030H

MOV DL,AH

MOV AH,2PUSH AXINT 21HPOP AXMOV DL,AL











Shift instructions

They manipulate binary numbers

Used to control I/O Devices. Shift operation moves the number either to left or right within

memory location or a register . There are four instructions.There are two types of shift (1)

arithmetic and (2) logical. The shift left operation is equivalent to multiply operation and shift

right is divide operation. The data is shifted to left or right only by one position.

Shift left operation

Logical left: The contents of the register or memory location are shifted left by one position theMSB bit moves to Carry flag bit and a zero is added to the LSB positionExample

SHL AX,1AX=0000 1111 0000 1111 and Carry=1

After the execution of the instruction

AX=0001 1110 0001 1110 and Carry =0Example

MOV CL,3SHL DX,CL

The contents of the DX register are shifted left by three postions

Arithmetical Left: It is same as logical left shift.

Logical right: The contents of the register or memory location are shifted right by one positionthe LSB bit moves to Carry flag bit and a zero is added to the MSB position

ExampleSHR AX,1AX=0000 1111 0000 1111 and Carry=0

ResultAX=0000 0111 1000 0111 and carry=1

Arithmetic right: The contents of the register or memory location are shifted right by one position the LSB bit moves to Carry flag bit and the sign bit is copied through the MSB position

Example

SAL AX,1

AX=1000 0000 0000 1111 and carry=0ResultAX=1100 0000 0000 0111 and carry=1











Example

SAR SI,3

SI= 1010 1100 1010 0101 C=0

After first shift SI= 1101 0110 0101 0010 C=1second shift SI=1110 1011 0010 1001 C=0third shift SI= 1111 0101 1001 0100 C=1

All condition flags are affected

Rotation instructions

There are four rotate instructions.

Rotate left: The contents of the memory location or the register are rotated left by the no of

positions indicated in the instruction through the carry or without the carry.

ROL BL,4

Let BL=0001 0110 C=0After first rotate C= 0 BL= 0010 1100After second rotate C=0 BL= 0101 1000After third rotate C=0 BL= 1011 0000After fourth rotate C=1 BL= 0110 0000

Rotate right

The contents of the memory location or the register are rotated right by the no of positionsindicated in the instruction through the carry or without the carry.

String instructions

REP it is a prefix used with instruction

REPE/REPZREPNE/REPNZThese are used with CMPS and SCAS instructionsThese instructions are used in the program as prefix.

CMPS

Compare string byte or string wordOnly Flags affectedZero flag set if strings match otherwise resetDS:SI and ES:DI are used to point to the two strings











SCAS

Scans the string of bytes or words for an operand byte or word specified in register AL or AXWhen match found the ZF=1 otherwise it is resetLODS

Load string byte or string word

Loads the AL/AX register by the contents of a string pointed by DS:SI No flag affectedSTOS

Stores contents of AL/AX register to a location in a string pointed by ES:DI No flag affected











A bus is used to communicate between components in a computer system. They are typicallyspecialized, with (for instance) a memory bus having different characteristics from an IO bus.Communications used in networks are different again.

An important distinction to be drawn early is that between a bus and a point-to-point network. A bus has a single wire, or connection of wires with multiple sources and destinations. A point-to- point network has some sort of interconnection scheme between sources and destinations inwhich the nodes are endpoints. In general a bus is cheaper and slower than other networktopologies; there is a shift in progress in some areas from busses to point-to-point networks; thisis most noticeable in high-speed ethernet and in AMD's multiprocessor system ``busses.''

Bus operations should be thought of in terms of transactions. A bus transaction is an operationon the bus; for instance, a memory read, or an interrupt service, or an IO write. A bus transactionis initiated by a bus master who requests that some operation take place; a slave will respond tothe transaction appropriately. Different busses support different sets of transactions

Bus Components

Busses typically have several components in common. In general, dedicated groups of wires areused for the different components (but see multiplexing, below).

1. Data: this is the whole purpose of the bus - to transmit data. Ordinarily, when you talkabout the ``width'' of a bus (8-bit bus, 32-bit bus, etc), you're talking about how manywires are used for data.

2. Address: this is how the components on the bus recognize that data on the bus is

intended for them. Ordinarily, whenever a bus transaction takes place, an address is puton the bus... recipient can tell who it's for. Memory and IO busses don't normally put asource address on; that's either implicit in the transaction or irrelevant. Networks, on theother hand, usually do.

3. Control: these wires contain a variety of information about the transaction, for instancewhat type of transaction it is (read, write, interrupt request, etc). Most parallel busses mayalso have a global clock, which would be a control line as well. A serial bus can't verywell do that, so the clock has to be carried with the data somehow.

4. Power and Ground: the bus has to have a ground wire, so the different components havea common voltage reference (they'll normally have lots of ground wires, for electricalimmunity!). Also, all the cards in the bus need to get power from somewhere, so the bus

itself is a convenient place to distribute it.

PCI Bus

The Peripheral Component Interconnect (PCI) bus is the standard I/O bus on recent computers ingeneral, and PCs in particular. It was developed by Intel in 1993 to replace the various busses which had been in use on bothPCs and Macintoshes.











It is a 32-bit, 33MHz bus with multiplexed address and data, and very nice capabilities forautoconfiguration ("Plug and Play"). It also supports both old, 5 volt devices and newer, 3.3 voltdevices.

Just as a brief note, it was developed by Intel in 1993 to replace the various busses which had been in use on both PCs and Macintoshes. To Intel's credit, it is a remarkably architecture-neutral bus. A very brief description would be that it is a 32-bit, 33MHz bus with multiplexed addressand data, and very nice capabilities for autoconfiguration ("Plug and Play"). It also supports bothold, 5 volt devices and newer, 3.3 volt devices.

There are many extensions to PCI. Best known is that it has simply been extended to 64 bits and66 MHz. In addition, there is a variant called PC-104+, which is a 32-bit PCI bus in a highlyshock and vibration resistant packaging. PCI-X is a backward-compatible extension to PCI, withPCI-X itself running at 266MHz and PCI-X 2.0 at 533 MHz. This latter also defines a 16 bitinterface for space-constrained applications, and a new bus mastering protocol (PCI SIG likes tocall this peer-to-peer) that looks a lot like messaging.

All transfers on the PCI bus are "burst" transfers. What this means is that once a device obtainsthe bus to perform a transfer, it is able to hang on to the bus indefinitely, and keep sending moredata every bus cycle (there's actually a time in the bus controller which will take control backafter some configurable time period, to keep transfers from being too long. The longer thetranfers are the better the throughput, but this can cause unacceptable delays for other devices).

Configuration Space

One of the nicest features of PCI is its support for autoconfiguration. In addition to every devicehaving an address on the PCI bus, every card has its own address determined by which slot it is plugged into. This is referred to as the card's configuration space, and can be queried (and partsof it can be written) by the CPU. This normally occurs at boot time; it may be performed by theBIOS prior to starting the boot loader, or it may be performed by the OS as it boots.

Here's a picture of the configuration space for a PCI device

Header(64 bytes)

Available(192 bytes)

00H

3FH

Identification











The most important parts of the configuration space are:

Vendor and Device ID The Vendor ID is a 16 bit number, assigned by the PCI SIG. You can look this numberup in a database to find out who built the card. The device ID is another 16 bit number,assigned by the vendor. You can look this up in a database to find out the device modelnumber. Put them together and you can know what kind of device you're going to betalking to, so you can run the right device driver.

Class Code This is a 24 bit number, assigned by I-don't-know-who, which identifies what kind ofdevice is on the card. The difference between this and the vendor/device id fields is thatthis will specify something like "serial port" You can run the device based on its class

code, but to take advantage of any extra features (like the fact it might be an 8-port cardinstead of a single-port card) requires the vendor and device IDs.

Base Registers Up to six base registers can be specified, for the devices located on the card. If you havefewer than six logical devices you will actually use fewer than these; if you have more,you will have to get into some ugly hacks (for instance, on an eight port serial card Ihave, six of the ports' base addresses are specified in the base addresses, while two are atfixed offsets from the first two of the six). Unlike the vendor and device ID fields, andthe class codes, the base register addresses are read/write.

PCI Commands

There are a total of 16 possible commands on a PCI cycle. They're in the following table:

Command Command Type

0000 Interrupt Acknowledge











0001 Special Cycle

0010 I/O Read

0011 I/O Write

0100 reserved

0101 reserved

0110 Memory Read

0111 Memory Write

1000 reserved

1001 reserved

1010 Configuration Read

1011 Configuration Write

1100 Multiple Memory Read

1101 Dual Address Cycle

1110 Memory-Read Line

1111 Memory Write and Invalidate

Here are some notes on the different transfer types

Interrupt Acknowledge (0000)The interrupt controller automatically recognizes and reacts to the INTA (interruptacknowledge) command. In the data phase, it transfers the interrupt vector to the ADlines.

Special Cycle (0001)

AD15-AD0

0x0000 Processor Shutdown

0x0001 Processor Halt

0x0002 x86 Specific Code

0x0003 to 0xFFFF Reserved

I/O Read (0010) and I/O Write (0011)Input/Output device read or write operation. The AD lines contain a byte address (AD0and AD1 must be decoded). PCI I/O ports may be 8 or 16 bits. PCI allows 32 bits ofaddress space. On IBM compatible machines, the Intel CPU is limited to 16 bits of I/O











space, which is further limited by some ISA cards that may also be installed in themachine (many ISA cards only decode the lower 10 bits of address space, and thus mirrorthemselves throughout the 16 bit I/O space). This limit assumes that the machine supportsISA or EISA slots in addition to PCI slots. The PCI configuration space may also beaccessed through I/O ports 0x0CF8 (Address) and 0x0CFC (Data). The address port must

be written first.Memory Read (0110) and Memory Write (0111)A read or write to the system memory space. The AD lines contain a doubleword address.AD0 and AD1 do not need to be decoded. The Byte Enable lines (C/BE) indicate which bytes are valid.

Configuration Read (1010) and Configuration Write (1011)A read or write to the PCI device configuration space, which is 256 bytes in length. It isaccessed in doubleword units. AD0 and AD1 contain 0, AD2-7 contain the doublewordaddress, AD8-10 are used for selecting the addressed unit a the malfunction unit, and theremaining AD lines are not used.

Multiple Memory Read (1100)

This is an extension of the memory read bus cycle. It is used to read large blocks ofmemory without caching, which is beneficial for long sequential memory accesses.Dual Address Cycle (1101)

Two address cycles are necessary when a 64 bit address is used, but only a 32 bit physicaladdress exists. The least significant portion of the address is placed on the AD lines first,followed by the most significant 32 bits. The second address cycle also contains thecommand for the type of transfer (I/O, Memory, etc). The PCI bus supports a 64 bit I/Oaddress space, although this is not available on Intel based PCs due to limitations of theCPU.

Memory-Read Line (1110)This cycle is used to read in more than two 32 bit data blocks, typically up to the end of a

cache line. It is more effecient than normal memory read bursts for a long series ofsequential memory accesses.Memory Write and Invalidate (1111)

This indicates that a minimum of one cache line is to be transferred. This allows mainmemory to be updated, saving a cache write-back cycle.

Interrupt Handling

PCI uses four pins, called INTA-INTD, for interrupt requests. When an interrupt is required, the proper pin is asserted.A card which only has a single interrupt will normally use INTAIn the modern systems BIOS exists which support PCI bus

------------------------------------------------------------------------------------------------------------------











Port details

The parallel port LPT1 is normally at I/O port addresses 378H, 379H and 37AH

The secondary port(if present) is located at 278H, 279H and 27AHThe parallel printer interface is located on the rear of the PCLPT stands for Line printerThe printer interface gives access to eight lines that can be programmed to receive or send data

The Centronics interface implemented by the parallel port uses two connectersOne is 25 pin D type on the back of the PCThe other one is 36 pin on the back of the PrinterThe parallel port can work as both transmitter and as well as receiverOther than printers CD ROMs can also be interfaced through parallel port.

Universal Serial Bus (USB) allows the addition of a new device to a PC by plugging it into the back of the machine or daisy-chaining it from another device on the bus. The device isimmediately available for use (no rebooting required) and often does not need a device driver to be installed (depending on the operating system being used.

USB 1.1 allows communication speeds of 12 Megabits per second (or 1.5 Megabytes persecond). The enhanced USB 2.0 will use the same cables, connectors, and software interfacesand will be backward compatible with older devices. USB 2.0 carries data at 360 to 480 Mbps(60 MBps). All cables use four wires; the distance between two devices can be up to five meters.

A big advantage of USB devices, apart from their much greater speed, is that USB devices

configure themselves automatically: gone are the days when you had to fiddle with IRQ settings,DMA channels, and I/O addresses to make a gadget work. Another benefit is that because USBdevices do not require IRQ settings, DMA channels, or I/O settings, COM and LPT portscurrently occupying an IRQ can be freed up, making more IRQ interrupts available for otheruses.

Another advantage of USB is that it is a standard port and can support just about any type ofdevice, including keyboards, mice, serial peripherals (e.g. modems), printers, audio input/output, joysticks, digital cameras, scanners, external hard disks and CD burners. Soon, the collection ofspace-consuming and costly dedicated ports for keyboards, printers, mice, modems etc willdisappear and replaced by USB. USB can be built into most chipsets for about $1. The other

advantage is that you can mix and match devices as much as you like. In the old days, if you hada parallel port scanner and a printer, they had to share the single printer port (and neither of themwould work properly for long.)

If you have several USB devices, it is best to use a USB hub. This is a little box that splits asingle USB port into 4 or more ports











USB uses a four-wire cable interface. Two of the wires are used in a differential mode for bothtransmitting and receiving data, and the remaining two wires are power and ground. The sourceof the power to a USB device can come from the host, a hub, or the device can be "self powered." There are two different connector types on each end of a USB cable. One of theseconnectors is for upstream communications, and the other for downstream. Each cable length is

limited to about 5 meters.

USB has four types of communication transfer modes:

• control,• interrupt,• bulk, and• isochronous.

Control mode is initiated by the host. In this mode, every data transfer must send data in bothdirections, but only in one direction at a time. The control mode is used mainly for initialization

of devices, but it can also be used to transfer small amounts of data.

In interrupt mode, interrupts do not occur in the usual sense. As in control mode, the host has toinitiate the transfer of data. Interrupt mode works by the host querying devices to see if they need

To be serviced

Bulk mode and isochronous mode complement each other in a sense. Bulk mode is used whendata accuracy is of prime importance, but the rate of data transfer is not guaranteed. An exampleof this would be disk drive storage. Isochronous mode sacrifices data accuracy in favor ofguaranteed timing of data delivery. An example of this would be USB audio speakers.

These four modes will be discussed in more detail below.

Above is an example of USB ports found on PCs and on some USB peripherals including

keyboards and monitors.

The PC host typically has connections for two external USB ports. Each of these two connectorson the PC is actually a connection to a separate root hub inside the PC. If either of the two roothubs needs to have more than one device connected to it, a downstream USB hub is required toexpand connections. Hubs are used to add to the number of devices that can be connected to oneUSB port. They can be considered to be a repeater of sorts and also a controller. When a device











is connected downstream of a hub, the hub does the connect detection of the new device andnotifies the host.

Hubs can be inside the device itself -- for example, in a keyboard that may have an additionaltwo downstream USB connectors for additional devices. A hub can have a combination of high

and low speed devices connected to it, up to a maximum of four additional hubs downstreamfrom itself. A hub's upstream port to the PC must be high speed. The hub acts as a traffic cop,handling communication to downstream devices as either high or low speed. A hub can ignore adownstream device that is not behaving properly. Hubs can be either self-powered or receive power from the USB bus. USB 1.x hubs support both low and high-speed data transfers.There are several hardware requirements for devices that are placed on the USB bus. Five voltsis the nominal supply voltage on the bus. A device that requires 100mA or less can be poweredfrom the host or any hub, provided that the total available power hasn't already been exhausted by other devices. A device on the bus can draw up to 500mA from it. However, not all USBhosts (especially a battery powered PC) or bus-powered hubs will allow a device to draw morethan 100mA from the bus. For this reason, a USB device that draws more than 100mA should, inmost cases, be self-powered .A device tells the host how much current is required for its operation. Self-powered devicesusually get their power from a separate power supply or batteries. A battery-powered device plugged into the bus can get its power from the bus if it meets the tests above, and it can thenswitch back over to battery power when it is disconnected from the bus or when the host is shutdown. When a device is in suspend mode, it cannot draw any more than 500uA from the bus if itis bus-powered. Also, if a device has not seen any activity on its bus in 3 mS, it needs to go intosuspend mode. A host can initiate a resume command to a device that is in suspend mode. Adevice can also issue a remote wakeup to an inactive host to make it active.All devices have endpoints, which are memory buffers. An endpoint can be as simple as anaddressable single register, or it can be a block of memory that is used to store incoming and/oroutgoing data. There may be multiple endpoints inside a device. Each device has at least oneendpoint -- "endpoint 0"-- which is used as a control endpoint. It must be able to both send andreceive data, but can only communicate in one direction at a time. Typically, when a devicereceives data such as an Out or Setup command from the host, this data is stored in the endpointand the device's microprocessor is interrupted and works on this data. When a device receives anIn command that is addressed to it from the host, data for the host that is stored in the endpoint issent to the host.The host is considered to be the master in most all cases. One exception is when a device issues aremote wakeup to the host as discussed above. There are time limits for both the host and deviceto respond to each other. For example, if the host requests data from a device using an Incommand, the device must send the data back to the host within 500mS, in some cases.Depending on the transaction type, the host and/or the device may respond to data received withan acknowledgement. Data transfer involves quite a bit of error-checking and handshaking. Thedifferent types of data packets sent and received use different ways to verify correct data transfer.A logical connection link needs to be set up between the host and a device before a transactioncan occur. This connection is referred to as a Pipe. It is set up as soon as possible after a host hasrecognized a device as being connected. When the host responds to a connect signal from thedevice, one of the parameters that is sent to the host is the device's required data transfer type andspeed. The host can refuse to establish a Pipe if the host does not have enough bandwidth tosupport the device's request or if its power requirements cannot be met. The device at its











discretion can lower its requested data rate and try again until the host accepts it and initiates aPipe.When a device is connected, it also sends to the host descriptor information on the types ofendpoints in the device, the type of data transfer it uses, size of data packets, endpoint addresseswithin the device, and if used, the time required between data transfers.

The following describes a typical data flow for a device when it is initially plugged into a host's bus while the host is active. Remember here that the host has an internal USB hub, andadditional hubs may be connected downstream from the host's hub.

1. The host recognizes that a device has been attached to one of its USB hubs. It realizesthis by a simple resistive divider that is connected to the differential data pair of wires inthe USB bus. These resistors are inside the USB hubs and devices.

2. The host sends a Get_Port_Status request to the hub to find out more about what has been plugged in. It could be another hub, a device connected directly to the host hub, or adevice that has been plugged into one of the downstream hubs.

3. After receiving a response from the hub, the host issues a Set_Port_Feature command inwhich the hub issues a reset over the data pair but only to the newly connected device onthe USB bus.

4. The host then checks to see if the device has come out of the reset state by issuing aGet_Port_Status command to the hub. After reset, the device is in the Default state andcan only draw a maximum of 100mA. In Default state, the device can communicate withthe host through Endpoint 0.

5. The hub now detects the device's speed by using the resistive dividers that are attached tothe USB bus. The hub sends the speed of this device back to the host.

6. The host then sends a Get_Descriptor command to the hub in which the hub gets the packet size needed from this particular device and sends the result back to the host.

7. The host now issues a Set_Address command to the hub which sends this information tothe device. The device in turn acknowledges the command back through the hub to thehost and sets up this address internally.

8. To learn more about this device, the host sends a Get_Descriptor command to the addressthat the device has been given. The information that is returned to the host consists ofvarious details of the device that the host needs to know for its operation. These queries

by the host continue two more times to retrieve all the information needed.9. Based on the information received from the device, the host determines the best device

driver to use for communications with it.

10. The device driver in the host now takes over by requesting a Set_Configurationcommand. There can be several configurations for one device, and the device driverdetermines which to use based on information received from the device in response to theGet_Descriptor command.











11. The device is now ready for use.

As you can see, the USB protocol is a fairly complex arrangement. This strict pattern of queryand response, however, is important in alleviating potential conflicts on the bus.--------------------------------------------------------------------------------------------------------------------











Introduced in 1986, the Intel 80386 provided a major upgrade to the earlier 8086 and 80286 processors in system architecture and features. The 80386 provided a base reference for thedesign of all Intel processors in the X86 family since that time, including the 80486, Pentium,Pentium Pro, and the Pentium II and III. All of these processors are extensions of the original

design of the 80386. All are upwardly compatible with it. Programs written to run on the 80386can be run with little or no modification on the later devices. The addressing scheme and internalarchitecture of the 80386 have been maintained and improved in the later microprocessors – thusa family of devices has evolved over the years that is the standard of a wide industry and uponwhich is based a vast array of software and operating system environments.

Major features of the 80386 include the following:

• A 32-bit wide address bus providing a real memory space of 4 gigabytes.• A 32-bit wide data bus.• Preemptive multitasking.•

Memory management, with four levels of protection.• Virtual memory support, allowing 64 terabytes of virtual storage.• Support for 8, 16, and 32-bit data types.• Three primary modes of operation (Real, Protected, Virtual 8086).• CMOS IV technology, 132-pin grid array.• Object code compatibility with earlier X86 designs.

PIN DESCRIPTIONS

Symbol Type Function

CLK2 In Provides the fundamental timing for the device.

D0 – D31 I/OData Bus inputs data during memory, I/O, or interrupt read cycles, andoutputs data during memory and I/O cycles.

A2 – A31 Out Address Bus provides physical memory or I/O port addresses.

BE0# -BE3#

OutByte Enable signals decode A0 and A1 to indicate specific banks for memorydata transfers.

W/R# Out Write/Read defines nature of data transaction in progress.

D/C# Out Data/Control distinguishes data transfer cycles (memory or I/O) from controlcycles (interrupt, halt, instruction fetch).

M/IO# Out Memory/IO identifies source/destination of current cycles.

LOCK# OutBus Lock responds to a prefix byte on an instruction that indicates that other bus masters may not intercede the current cycle until it is complete.











ADS# OutAddress Status indicates that a valid set of addressing signals are beingdriven onto the device pins.These include W/R#, D/C#, M/IO#, BE0#-BE3#,and A2-A31.

NA# In Next Address is used to request address pipelining.

READY# In Bus Ready requests a wait state from attached devices.

BS16# In Bus Size 16 requests a 16-bit rather than a 32-bit data transfer.

HOLD In Bus Hold Request initiates a DMA cycle.

HLDA OutBus Hold Acknowledge indicates that the processor is honoring a DMArequest./TD>

BUSY# In Busy is a synchronization signal from an attached coprocessor, e.g., 80387.

ERROR# In Error signals an error condition in an attached coprocessor.

PEREQ InProcessor Extension Request synchronizes a coprocessor data transfer viathe 80386.

INTR In Interrupt accepts a request from a interrupting device (maskable).

NMI In Non-Maskable Interrupt forces an interrupt that cannot be ignored.

RESET InReset causes the processor to enter a known state and destroys any executionin progress.

N/C No Connect indicates pins that are not to have any electrical connections.

VCC In Power Supply typically +5 volts.

VSS In Ground.

DATA FLOW Refer to the following diagram for illustration.The Intel 80386 data flow consists of three primary areas. These are the bus interface unit (BIU),the central processing unit (CPU), and a memory management unit (MMU). These areinterconnected within the device by several 32-bit-wide data busses and an internal control bus.The Bus Interface Unit (BIU) provides the attachments of the device to the external bus system.The circuits include a set of address bus drivers which generate or receive the A2 – A31 addresslines; the BE0 – BE3 byte selection lines; the control lines M/IO, D/C, W/R, Lock, ADS, NA,BS16, and Ready; and interface with the D0 – D31 data bus lines. The unit includes a pipelinecontrol element which provides the memory access pipelining that permits fast data transfer fromcontiguous memory locations. The unit also includes a set of multiplex transceivers to handle thedirection of incoming or outgoing data and address information. Also included is a controlelement that handles requests for interrupts, DMA cycles, and coprocessor synchronization.











The Central Processing Unit (CPU) is connected to the BIU via two paths. One is the directALU bus (across the bottom of the drawing) that allows exchange of addressing information anddata between the CPU and the BIU if needed. The second is the normal path for instruction partswhich go by way of an instruction prefetching element that is responsible for requestinginstruction bytes from the memory as needed; an instruction predecoder that accepts bytes from

the queue and ensures at least 3 instructions are available for execution; the instruction decoderand execution unit that causes the instruction to be performed. This is accomplished by the use ofmicroprograms stored in the system control ROM which is stepped through to control the dataflow within and around the Arithmetic Logic Unit (ALU).The ALU consists of a register stack which contains both programmer-accessible and non-accessible 32-bit registers; a hardware multiply/divide element; and a 64-bit barrel shifter forshifts, rotates, multiplies, and divides. The ALU provides not only the data processing for thedevice but also is used to compute effective addresses (EAs) for protected mode addressing.The Memory Management Unit (MMU) provides the support for both the segmentation ofmain memory for both protected mode and real mode, and the paging elements for virtualmemory. In real mode, the segmentation of the main memory is limited to a maximum segment

size of 64K bytes, and a maximum memory space of 1.024 megabytes. This is in concert with theIntel 8086 upon which this processor is based. In protected mode, several additional registers areadded to support variable length segments to a maximum theoretical size of 4 gigabytes, whichin turn supports multitasking and execution priority levels. Virtual mode using the device’s paging unit allows a program or task to consume more memory than is physically attached to thedevice through the translation of supposed memory locations into either real memory or disk- based data.MODES OF OPERATION The Intel 80386 has three modes of operation available. These are Real Mode, Protected Mode,and Virtual 8086 mode.Real Mode operation causes the device to function as would an Intel 8086 processor. It is faster

by far that the 8086. While the 8086 was a 16-bit device, the 80386 can provide 32-bitextensions to the 8086’s instructions. There are additional instructions to support the shift to protected mode as well as to service 32-bit data. In Real Mode, the address space is limited to1.024 megabytes. The bottom 1,024 bytes contain the 256 4-byte interrupt vectors of the 8086.The Reset vector is FFFF0h. While the system can function as a simple DOS computer in thismode forever, the main purpose of the mode is to allow the initialization of several memorytables and flags so that a jump to Protected Mode may be made.Protected Mode provides the 80386 with extensive capabilities. These include the memorymanagement, virtual memory paging, multitasking, and the use of four privilege levels whichallows the creation of sophisticated operating systems such as Windows NT and OS/2. (Thesewill be further explained.)Virtual 8086 Mode allows the system, once properly initialized in Protected Mode, to create oneor more virtual 8086 tasks. These are implemented essentially as would be a Real Mode task,except that they can be located anywhere in memory, there can be many of them, and they arelimited by Real Mode constructs. This feature allows a 386-based computer, for example, to provide multiple DOS sessions or to run multiple operating systems, each one located in its own8086 environment. OS/2 made use of this feature in providing multiple DOS sessions and tosupport its Windows 3.1 emulator. Windows NT uses the feature for its DOS windows.REGISTER ORGANIZATION











Programmer-visible Registers

The ‘386 provides a variety of General Purpose Registers (GPRs) that are visible to the programmer. These support the original 16-bit registers of the 8086, and extend them to 32-bitversions for protected mode programming.Chart goes here.

The AX, BX, CX, and DX registers exist in the same form as in the 8086. The may be used as16-bit registers when called with the "X" in their name. They may also be used as 8-bit registerswhen defined with the "H" and "L" in their names. Hence, the AX register is used as a 16-bitdevice while the AH and AL are used as 8-bit devices. Similarly, Source Index (SI), DestinationIndex (DI), Base Pointer (BP) and Stack Pointer (SP) registers exist in their traditional 16-bitform.To use any of these registers as 32-bit entities, the letter "E", for extended, is added to theirnames. Hence, the 16-bit AX register can become the 32-bit EAX register, the 16-bit DI register becomes the 32-bit EDI register, etc.The registers of the ‘386 includes the 8086’s Code Segment (CS) register, Stack Segment (SS)register, Data Segment (DS) register, and Extra Segment (ES) register which are used as

containers for values pointing to the base of these segments. Additionally, two more data-oriented segment registers, the FS and GS registers, are provided. In real mode, these registerscontain values that point to the base of a segment in the real mode’s 1.048 megabyte addressspace. An offset is added to this displaced to the right which generates a real address. In protected mode, the segment registers contain a "selector" value which points to a location in atable where more information about the location of the segment is stored.The ‘386 also provides an Instruction Pointer (IP) register and a Flags (FLAGS) register whichoperate as they did in the 8086 in real mode. In protected mode, these become 32-bit deviceswhich provide extended features and addressing.The 32-bit FLAGS register contains the original 16 bits of the 8086-80286 flags in bit positions 0through 15 as follows. These are available to real mode.

Bit Flag Description

0 CF Carry Flag

1 1 Always a 1

2 PF Parity Flag

3 0 Always a 0

4 AF Auxiliary Carry Flag

5 0 Always a 0

6 ZF Zero Flag

7 SF Sign Flag

8 TF Trap Flag











9 IF Interrupt Enable

10 DF Direction Flag

11 OF Overflow Flag

12-13 PL1,2 I/O Privilege Level Flags

14 NT Nested Task Flag

15 0 Always a 0

Two more flags are provided to support protected mode.

Bit Flag Description

16 RF Resume Flag

17 VM Virtual Mode

Here are some brief descriptions of the functions of these flags.CARRY FLAG – This flag is set when a mathematical function generated a carry out of thehighest bit position of the result, such as when 9 + 1 = 10.PARITY FLAG – This flag is set when the low order 8 bits of an operation results in an evennumber of one’s set on, that is, even parity.AUXILIARY CARRY FLAG – This flag is set when there is a carry out of the lower four bits ofa 8-bit byte due to a mathematical operation. It supports the use of packed BCD encoding foraccounting.ZERO FLAG – This flag is set if all bits of a result are 0.

SIGN FLAG – This bit is set if the high-order bit of a result is a 1. In signed mathematics, thisindicates a negative number.TRAP ENABLE FLAG – This flag supports the use of Exception 1 when single steppingthrough code with a debugger package. When the flag is set, the ‘386 will execute an Exception1 interrupt after the execution of the next instruction. If reset, the ‘386 will execute an Exception1 interrupt only at breakpoints.INTERRUPT ENABLE FLAG – This flag, when set, allows interrupts via the INTR device pinto be honored.DIRECTION FLAG – This flag supports string OP codes that make use of the SI or DI registers.It indicates which direction the succeeding count should take, decrement if the flag is set, andincrement if the flag is clear.

OVERFLOW FLAG – This flag is set if an operation results in a carry into the uppermost bit ofthe result value, that is, if a carry in the lower bits causes the sign bit to change.I/O PRIVILEGE LEVEL - These two flags together indicate one of four privilege levels underwhich the processor operates in protected mode. These are sometimes called "rings", with ring 0 being the most privileged and ring 3 the least.RESUME FLAG – This flag supports a debug register used to manage breakpoints in protectedmode.











VIRTUAL MODE – This flag supports the third mode of operation of the processor, Virtual8086 mode. Once in protected mode, if set, this flag causes the processor to switch to virtual8086 mode. Programmer-invisible Registers

To support protected mode, a variety of other registers are provided that are not accessible by the

programmer. In real mode, the programmer can see and reference the segment registers CS, SS,DS, ES, FS, and GS as 16-bit entities. The contents of these registers are shifted four bit positions to the left, then added to a 16-bit offset provided by the program. The resulting 20-bitvalue is the real address of the data to be accessed at that moment. This allows a real addressspace of 220 or 1.048 megabytes. In this space, all segments are limited to 64K maximum size.In protected mode, segments may from 1 byte to 4.3 gigabytes in size. Further, there is moreinformation that is needed than in real mode. Therefore, the segment registers of real mode become holders for "selectors", values which point to a reference in a table in memory thatcontains more detail about the area in the desired segment. Also, a set of "Descriptor Registers"is provided, one for each segment register. These contain the physical base address of thesegment, the segment limit (or the size of the segment relative to the base), and a group of other

data items that are loaded from the descriptor table. In protected mode, when a segment registeris loaded with a new selector, that selector references the table that has previously been set up,and the descriptor register for that segment register is given the new information from the tableabout that segment. During the course of program execution, addressing references to thatsegment are made using the descriptor register for that segment.Four Control Registers CR0 – CR3 are provided to support specific hardware needs. CR0 iscalled the Machine Control Register and contains several bits that were derived in the 80286.These are:PAGING ENABLED, bit 31 – This bits when set enables the on-chip paging unit for virtualmemory.TASK SWITCHED, bit 3 – This bit is set when a task switch is performed.

EMULATE COPROCESSOR, bit 2 – This bit causes all coprocessor OP codes to cause aCoprocessor-Not-Found exception. This is turn will cause 80387 math coprocessor instructionsto have to be interpreted by software.MONITOR COPROCESSOR, bit 1 – Works with the TS bit above to synchronize thecoprocessor.PROTECTION ENABLED, bit 0 – This bit enables the shift to protected mode from real mode.System Address Registers

Four special registers are provided to reference the tables needed to support protected mode.These tables are the Global Descriptor Table (GDT), the Interrupt Descriptor Table (IDT), theLocal Descriptor Table (LDT) , and the Task State Segment (TSS). The addresses of these tablesare stored in four special registers called the Global Descriptor Table Register (GDTR), theInterrupt Descriptor Table Register (IDTR), the Local Descriptor Table Register (LDTR), andthe Task State Segment Register (TR).These tables and registers will be discussed more fully below. Briefly, the GDTR and IDTR holdthe 32-bit linear base address and the 16-bit limit of the GDT and IDT, respectively. Both ofthese tables are global in nature and so may be located anywhere, including in paged virtualmemory. These tables are not located using segment registers.The LDTR and TR hold 16-bit selector values for the LDT descriptor and the TSS descriptor,respectively. The LDT and TSS are task-specific segments, tied to individual tasks within the











system. Selectors to locate these segments are stored in system segment registers. These selectors point to the table areas and work in conjunction with two descriptor registers that contain a 32-bitlinear base address for the table, a 32-bit segment limit, and some attribute bytes. This is inconcert with the approach for the programmer-visible segment registers.MEMORY ORGANIZATION

In the 80386, data may take the physical form of an 8-bit byte, a 16-bit word, or a 32-bit doubleword. The electrical organization of the memory connections to the processor follow this model.For a word, the high-order byte is in the higher-addressed of the two bytes, while the low-order byte is in the lower-addressed address. In the case of a 32-bit double word, the highest-most byteis located in the highest address, and the lowest byte is located in the lowest address. The addressof the lowest byte is the address of the word or double word.The 80386 also supports two other memory units. These are segments and pages. Segments can be variable in length, and this supports a programmer’s approach to arranging code and data insome sort of logical order for the work being done. Pages in this case are 4,096 bytes in length,and these support the use of physical memory management, e.g., virtual memory.Intel defines three distinct address spaces. These are physical, logical, and linear. The

programmer sees and works with the logical address space. It is constructed of the informationcontained in a selector’s table reference plus the addition of various offsets and indexes created by the addressing modes of the program’s instructions. The programmer does not care about theamount of memory consumed – he/she considers it to be "virtually" infinite in size. In fact, sinceeach task in an 80386 can have 16K selectors, and since offsets can be as much as 4 gigabytes insize, the programmer can work with as much as 64 terabytes of memory.The segmentation unit of the processor translates the logical address space of the programmer tothe 32-bit linear address space. If the paging unit is not enabled, the linear address space and the physical address space are one and the same, and the linear address space is sent directly to thedevice pins, thus becoming the physical address. If the paging unit is in use, then it will furthermodify the linear address space to a different address space that may engage outside storage to

implement the memory that doesn’t really exist in the system.The primary difference between real and protected mode is the way in which the segmentationunit performs the translation between logical and linear addresses. Just like the 8086, in realmode the 80386 shifts the contents of the segment register left four bit positions, then adds theoffset to create a 20-bit linear address which is the same as the physical address (in the 8086,there is no paging unit, so the linear and the physical addresses are always the same). In protected mode, every selector has a linear base address connected with it. This linear baseaddress is stored in either the GDT or an LDT for the segment. From the table, the selector’slinear base address is added to the offset from the program’s instructions to create the final linearaddress. If paging is off, this is also the physical address. If paging is on, further modificationoccurs to create the final physical address from the logical address.REAL MODE ARCHITECTURE

When the 80386 is reset, it assumes real mode. Real mode has the same base architecture as the8086. The addressing mechanism, memory size, and interrupt handling are all identical to thereal mode of the 80286. However, the system allows access to the 32-bit register set. Almost allof the 80386 instructions are accessible in real mode. The default operand size is 16 bits. To usethe 32-bit registers and addressing modes of the 80386 in real mode, override prefixes must beused. The segment size in real mode is a maximum of 64K. Because of the paging feature











available for protected mode, it is not possible to use the lock prefix on repetitive stringinstructions.In real mode, the memory size is limited to one megabyte, using addressing lines A2 throughA19. The exception to this is that the A20 – A31 lines are available from reset to the firstexecution of an intersegment jump or call. This allows the ROM holding the power-on reset

vector and initialization code to be accessed. There is no paging in real mode, so the linearaddress is the physical address.The physical address is created by the offset addition of the contents of a particular segmentregister (CS, DS, SS, or ES) and an offset provided by the analysis of an instruction, such as thecontents of an index register (SI or DI). The addition is offset such that the segment register’scontents is added four bit positions to the left of the offset’s value, thus:IllustrationHence, although the segment registers and the offsets can contain no more than 16 bits each, thesum generated is 20 bits long, sufficient to access any location with the 1.048 megabyte addressspace of real mode. Since the segment register contents is always shifted four bits to the left,segments must always begin on 16-byte boundaries. Segments may be overlapped, that is, there

is no restriction to segment registers containing values that are within another segment’s 64-Kaddress space. This allows programmers to use only what is needed for a small program.There are two reserved areas of the memory map. The interrupt vectors occupy the first 1,024 bytes of them memory space, addresses 0 to 3FFh. This permits 256 4-byte jump vectors as inthe 8086. Also, locations FFFFFFF0h through FFFFFFFFh are reserved for the reset vector. Thesystem will initially attempt to execute an instruction at FFFFFFF0h following the clearing ofreset.The Halt (HLT) instruction will cause the processor to stop execution and keeps it from using itslocal bus. Halt can be cleared by three methods:

1. A non-maskable interrupt (NMI);

2.

A regular interrupt (INTR) with interrupts enabled;3. A system reset.

PROTECTED MODE ARCHTECTURE

The 80386 is most impressive when running in protected mode. The linear address space can beas great as 232 (4294967295) bytes. With the paging unit enabled, the limit is 246 or about 64terabytes. The device can run all 8086 and 80286 code. It provides a memory management and ahardware-assisted protection mechanism that keeps one program’s execution from interferingwith another. Additional instructions are provided to support multitasking. The programmer seesan expanded address space available to her/him, and different addressing scheme. Memory Segmentation

Memory segmentation in protected mode uses a segment base value and an offset in the mannerof real mode. However, because of the increased size of the address space now available, a morecomplex arrangement is used. The segment register now contains a value called a selector. Thisis a 16-bit value which contains an offset into a table. This table, called a descriptor table,contains descriptors which are 8-byte values that describe more about the segment in question.Two tables provided are the Global Descriptor Table (GDT) and the Local Descriptor Table(LDT). The GDT contains information about segments that are global in nature, that is, availableto all programs and normally used most heavily by the operating system. The LDT contains











descriptors that are application specific. Both of these tables have a limit of 64K, that is, 8,192 8- byte entries. There is also an Interrupt Descriptor Table (IDT) that contains information aboutsegments containing code used in servicing interrupts. This table has a maximum of 256 entries.The upper 13 bits of the selector are used as an offset into the descriptor table to be used. Thelower 3 bits are:

• TI, a table selection bit – 0 = use the GDT, 1 = use the LDT.• RPL, Requested Privilege Level bits = 00 is the highest privilege level, 11 is the lowest.

The selector identifies the table to be used and the offset into that table where a set of descriptor bytes identifies the segment specifically. Each table can be 64K bytes in size, so if there are 8 bytes per table entry, a total of 8,192 entries can be held in one table at a given time. Thecontents of a descriptor are:Bytes 0 and 1 – A 16-bit value that is connected to bits 0 – 3 of byte 6 to form the uppermostoffset, or limit, allowed for the segment. This 20 bit limit means that a segment can be between 1 byte and 1 megabyte in size. See the discussion of the granularity bit below.

Bytes 2 and 3 – A 16-bit value connected to byte 4 and byte 7 to form a 32-bit base value for thesegment. This is the value added to the offset provided by the program execution to form thelinear address.AV bit – Segment available bit, where AV=0 indicates not available and AV=1 indicatesavailable.D bit – If D=0, this indicates that instructions use 16-bit offsets and 16-bit registers by default. IfD=1, the instructions are 32-bit by default.Granularity (G) bit – If G=0, the segments are in the range of 1 byte to 1 megabyte. If G=1, thesegment limit value is multiplied by 4K, meaning that the segments can have a minimum of 4K bytes and a maximum limit of 4 gigabytes in steps of 4K.Byte 5, Access Rights byte – This byte contains several flags to further define the segment:

• Bit 0, Access bit – A=0 indicates that the segment has not been accessed; A=1 indicatesthat the segment has been accessed (and is now "dirty").

• Bits 1, R/W bit; bit 2, ED/C bit; and bit 3, E bit. If bit 3 = 0, then the descriptorreferences a data segment and the other bits are interpreted as follows: bit 2, interpretedas the ED bit, if 0, indicates that the segment expands upward, as in a data segment; if 1,indicates that the segment expands in the downward direction, as in a stack segment; bit1, the R/W bit, if 0, indicates that the segment may not be written, while if 1 indicatesthat the segment is writeable.

If bit 3 = 1, then the descriptor references a code segment and the other bits areinterpreted as follows: bit 2, interpreted as the C bit, if 0, indicates that we should ignorethe descriptor privilege for the segment, while if 1 indicates that privilege must beobserved; bit 1, the R/W bit, if 0, indicates that the code segment may not be read, whileif 1 indicates that the segment is readable.

• Bit 4, System bit – If 0, this is a system descriptor; if 1, this is a regular code or datasegment.

• Bits 5 and 6, Descriptor Privilege Level (DPL) bits – These two bits identify the privilegelevel of the descriptor.











• Bit 7, Segment Valid (P) bit – If 0, the descriptor is undefined. If 1, the segment containsa valid base and limit.

Use the illustration below to follow the flow of address translation. Numbers in circles on thedrawing match those below.

File goes here

1. The execution of an instruction causes a request to access memory. The segment portionof the address to be used is represented by a selector value. This is loaded into thesegment register. Generally, this value is not changed too often, and is controlled by theoperating system.

2. The selector value in the segment register specifies a descriptor table and points to one of8,192 descriptor areas. These contain 8 bytes that identify the base of the real segment, itslimit, and various access and privilege information.

3. The base value in the descriptor identifies the base address of the segment to be used inlinear address space.

4.

The limit value in the descriptor identifies the offset of the top of the segment area fromthe base.5. The offset provided by the instruction is used to identify the specific location of the

desired byte(s) in linear address space, relative to the base value.

The byte(s) thus specified are read or written as dictated by the instruction. Program Invisible Registers

Several additional registers are provided that are normally invisible to the programmer but arerequired by the hardware of the processor to expedite its functions.Each of the segment registers (CS, DS, SS, ES, FS, and GS) have an invisible portion that iscalled a cache. The name is used because they store information for short intervals – they are not

to be confused with the L1 or L2 cache of the external memory system. The program invisible portions of the segment registers are loaded with the base value, the limit value, and the accessinformation of the segment each time the segment register is loaded with a new selector. Thisallows just one reference to the descriptor table to be used for multiple accesses to the samesegment. It is not necessary to reference the descriptor table again until the contents of thesegment register is changed indicating a new segment of that type is being accessed. This systemallows for faster access to the main memory as the processor can look in the cache for theinformation rather than having to access the descriptor table for every memory reference to asegment.The Global Descriptor Table Register (GDTR) and the Interrupt Descriptor Table Register(IDTR) contain the base address of the descriptor tables themselves and their limits, respectively.The limit is a 16-bit value because the maximum size of the tables is 64K.System Descriptors

The Local Descriptor Table Register contains a 16-bit wide selector only. This value references asystem descriptor, which is similar to that as described above, but which contains a type fieldthat identifies one of 16 types of descriptor (specifically type 0010) that can exist in the system.This system descriptor in turn contains base and limit values that point to the LDT in use at themoment. In this way, there is one global descriptor table for the operating system, but there can be many local tables for individual applications or tasks if needed.











System descriptors contain information about operating system tables, tasks, and gates. Thesystem descriptor can identify one of 16 types as follows. You will notice that some of these areto support backward compatibility with the 80286 processor.

Type Purpose

0000 Invalid

0001 Available 80286 Task State Segment

0010 Local Descriptor Table

0011 Busy 80286 Task State Segment

0100 80286 Call Gate

0101 Task Gate

0110 80286 Interrupt Gate

0111 80286 Trap Gate

1000 Invalid

1001 Available 80386 Task State Segment

1010 Reserved

1011 Busy 80386 Task State Segment

1100 80386 Call Gate

1101 Reserved

1110 80386 Interrupt Gate

1111 80386 Trap Gate

Protection and Privilege Levels

The 80386 has four levels of protection which support a multitasking operating system. Theseserve to isolate and protect user programs from each other and from the operating system. The

privilege levels manage the use of I/O instructions, privileged instructions, and segment andsegment descriptors. Level 0 is the most trusted level, while level 3 is the least trusted level.Intel lists the following rules for the access of data and instruction levels of a task:

• Data stored in a segment with privilege level P can be accessed only by code executing ata privilege level that is at least as privileged as P.

• A code segment or procedure with privilege level P can only by called by a taskexecuting at the same or a less privileged level than P.











At any point in time, a task can be operating at any of the four privilege levels. This is called thetask’s Current Privilege Level (CPL). A task’s privilege level may only be changed by a controltransfer through a gate descriptor to a code segment with a different privilege level.The lower two bits of selectors contain the Requested Privilege Level (RPL). When a change ofselector is made, the CPL of the task and the RPL of the new selector are compared. If the RPL

is more privileged than the CPL, the CPL determines the level at which the task will continue. Ifthe CPL is more privileged than the RPL, the RPL value will determine the level for the task.Therefore, the lowest privilege level is selected at the time of the change. The purpose of thisfunction is to ensure that pointers passed to an operating system procedure are not of a higher privilege than the procedure that originated the pointer.Gates

Gates are used to control access to entry points within the target code segment. There are fourtypes:

• Call Gates – those associated with Call, Jump, Return and similar operations codes. They provide a secure method of privilege transfer within a task.

•

Task Gates – those involved with task switching.• Interrupt Gates – those involved with normal interrupt service needs.• Trap Gates – those involved with error conditions that cause major faults in the

execution.

A gate is simply a small block of code in a segment that allows the system to check for privilegelevel violations and to control entry to the operating system services. The gate code lives in asegment pointed to by special descriptors. These descriptors contain base and offset values tolocate the code for the gate, a type field, a two-bit Default Privilege Level (DPL) and a five-bitword count field. This last is used to indicate the number of words to be copied from the stack ofthe calling routine to that of the called routine. This is used only in Call Gates when there is a

change in privilege level required. Interrupt and Trap gates work similarly except that there is no pushing of parameters onto the stack. For interrupt gates, further interrupts are disabled. Gatesare part of the operating system and are mainly of interest to system programmers.Task Switching

An important part of any multitasking system is the ability to switch between tasks quickly.Tasks may be anything from I/O routines in the operating system to parts of programs written byyou. With only a single processor available in the typical PC, it is essential that when the needsof the system or operator are such that a switch in tasks is needed, this be done quickly.The 80386 has a hardware task switch instruction. This causes the machine to save the entirecurrent state of the processor, including all the register contents, address space information, andlinks to previous tasks. It then loads a new execution state, performs protection checks, and begins the new task, all in about 17 microseconds. The task switch is invoked by executing anintersegment jump or call which refers to a Task Switch Segment (TSS) or a task gate descriptorin the LDT or GDT. An INT n instruction, exception, trap, or external interrupt may also invokea task switch via a task gate descriptor in the associated IDT.Each task must have an associated Task Switch Segment. This segment contains an image of thesystem’s conditions as they exist for that task. The TSS for the current task, the one beingexecuted by the system at the moment, is identified by a special register called the Task SwitchSegment Register (TR). This register contains a selector referring to the task state segment











descriptor that defines the current TSS. A hidden base and limit register connected to the TR areloaded whenever TR is updated. Returning from a task is accomplished with the IRETinstruction which returns control to the task that was interrupted with the switch. The currenttask’s segment is stored and the previous task’s segment is used to bring it into the current task.Control Registers

The 80386 has four "Control Registers" called CR0 through CR3. CR0 contains several bit flagsas follows:PG – When set to 1, causes the translation of linear addresses to physical addresses. Indicatesthat paging is enabled and virtual memory is being used.ET – When set to 1, indicates that the 80387 math coprocessor is in use.TS – When set to 1, indicates that the processor has switched tasks.EM – When set to 1, causes a type 7 interrupt for the ESC (escape) instruction for the mathcoprocessor.MP – When set to 1, indicates that the math coprocessor is present in the system.PE – Selects protected mode of operation.CR 1 is not used by the ‘386. CR2 contains page fault linear addresses for the virtual memory

manager. CR3 contains a pointer to the base of the page directory for virtual memorymanagement.Switching to Protected Mode

At reset, the 80386 begins operation in Real Mode. This is to allow setup of various conditions before the switch to Protected Mode is made. The actual switch is accomplished by setting thePE bit in CR0. The following steps are needed.

1. Initialize the interrupt descriptor table to contain valid interrupt gates for at least the first32 interrupt types. The IDT can contain 256 8-byte gates.

2. Set up the GDT so that it contains a null descriptor at position 0, and valid descriptors forat least one code, one data, and one stack segment.

3.

Switch to protected mode by setting PE to 1.4. Execute a near JMP to flush the internal instruction queue and to load the TR with the base TSS descriptor.

5. Load all the data selectors with initial values.6. The processor is now running in Protected Mode using the given GDT and IDT.

In the case of a multitasking system, an alternate approach is to load the GDT with at least twoTSS descriptors in addition to the code and data descriptors needed for the first task. The firstJMP following the setting of the PE bit will cause a task switch that loads all the data neededfrom the TSS of the first task to be entered. Multitasking is then initialized.VIRTUAL 8086 MODE

The third mode of operation provided by the 80386 is that of Virtual 8086 Mode. Once in protected mode, one or more virtual 8086 tasks can be initiated. Virtual 8086 tasks appear to belike real mode. The task is limited to 1 megabyte of memory whose address space is located at 0through FFFFFh; the segment registers are used as they are in real mode (no selectors or lookuptables are involved). Each of the virtual 8086 tasks are given a certain amount of time using atime-slice algorithm typical of mainframes (timesharing). The software for such tasks is writtenas if they were to run in a real mode address space. However, using paging, multiple suchsessions can be located anywhere in the virtual memory space of the 80386.











Windows NT and OS/2 use this technique to support one or more DOS sessions, or low-priorityutilities such as a print spooler.VIRTUAL MEMORY AND PAGING

Using selectors and tables, the 80386 generates what Intel defines as a linear address as a meansof locating data or instructions for real mode or for the current task in protected mode. If the

system is not using virtual memory or paging, then the linear address is the physical address ofthe desired data or bytes, and is forwarded to the pins of the device to become the physicaladdress.Paging allows a level of interpretation to be inserted between the linear address and the physicaladdress. The linear address is passed to the paging unit, and it in turn converts it to a physicaladdress that will be different than the linear one. This allows several options, including 1)mapping a linear address to some other physical address according to the needs of a multitaskingoperating system to place tasks at convenient locations, or 2) mapping linear addresses tomemory that does not exist in the system, but might be replaced by disk space.Paging logically divides the available virtual space into "pages" that are 4Kbytes in size. Threeelements are needed to implement paging. These are the page directory, the page table, and the

actual physical memory page. Values in these tables are obtained by combining parts of thelinear address with values from the tables which point to other values.The page directory is a table of as many as 1,024 4-byte entries. (This is a maximum number;most systems use far fewer entries.) The base of the page directory is determined by the valuecontained in CR3. An offset into the directory is created from the uppermost 10 bits (positions22-31) of the linear address. At this offset in the directory, we find a pointer to the base of a pagetable. This means that there can be as many as 1,024 page tables in a system.There are 1,024 entries possible in each page table. The middle 10 bits of the linear address (bit positions 12 through 21) are used as a offset into the selected page table. The value thusdetermined is a pointer to the base of a 4K memory page. The offset into the page to located thespecific data needed is contained in the lower 12 bits of the linear address.

The entries in the page directory and page tables are identical. They contain 10 bits ofaddressing, and the following flags:D or DIRTY bit: This bit is not used in the page directory. In the page table entries, it indicatesthat the 4K area defined by this entry has been written to, and so must be saved (as to disk) if thearea is to be reused for something else.A or ACCESSED bit: This bit is set to a 1 when the processor accesses the 4K page.R/W or Read/Write and U/S or User/Supervisor bits: These are used in conjunction with privilege management.P or PRESENT bit: This bit when set to 1 indicates that the referenced page is present inmemory. If 0, it can be used to indicate that the page is not in RAM, e.g., is on disk.Performance of the paging system would be affected if the system needed to reference memorytables each time a reference to RWM was made. To offset this, a Translation Lookaside Buffer(TLB) is provided. This is a 4-way set-associative cache that contains entries for the last 32 pages needed by the processor. This provides immediate information about 98% of the time,causing only 2% of memory accesses to make the page directory-page table translation.HARDWARE HIGHLIGHTS

The instructor will provide you with illustrations of the timing sequences for the various read andwrite cycles available on the 80386. There are two items of interest that we note here. Address Pipelining











Under non-pipelined conditions, the bus signals of the ‘386 function very much like any other processor. A machine cycle consists of two T-states, T1 and T2. These are defined by thefollowing edge of the system clock signal. At the beginning of T1, an address appears on theBE0# through BE3# and A2 through A31 lines, along with various control lines. The address isheld valid until very near the end of T2. The ADS# line is pulled low (active) during T1 to

indicate that the address bus contains a valid address; the ADS# line is pulled high (negated)during T2. The data is passed in or out at the transition between the end of T2 of the currentcycle and the start of T1 of the following machine cycle. During this time, the NA# line ismaintained high (negated).In pipelining, the address bits are available ½ machine cycle earlier than with no pipelining. TheADS# line is pulled low during T2 of a cycle rather than T1, indicating that during T2, theaddress of the data to be exchanged during the next machine cycle is available. Pipelining isinitiated by the incoming line NA#, that is controlled by the memory subsystem. If pulled lowduring a T1, the memory expects that the address of the next bytes needed will be available ½cycle early.The purpose of pipelining is to minimize the need for wait states. The time needed to read or

write data remains the same. However, the time an address is available before the data isexpected is lengthened so that a wait state may not be needed. The memory subsystem has to bedesigned to work within these parameters. Dynamic Bus Sizing

Normally, the 80386 expects data to be transferred on a 32-bit wide data bus. However, it is possible to force the system to transfer 32-bit data as two 16-bit quantities in two successive buscycles. This is initiated by the BS16# signal coming from the memory or I/O device subsystem.This line is pulled low during the middle of T2. It indicates to the processor that 32-bit data will be sent as two 16-bit words, with D0-D15 on the first transfer and D16-D31 on the second. Thedata is transferred on the D0-D15 bus lines; the D16-D31 lines are ignored.INSTRUCTION SET

The instruction set of the 80386 is compatible with that of the 8086 and the programming for that processor can run on the ‘386 without modification. However, the ‘386 includes extension of the base instruction set to support 32-bit data processing and operation in protected mode. Thereader is referred to the Intel documentation for full particulars on each instruction and its possible versions. Here we discuss the essential aspects of instruction organization.Instructions vary in length, depending upon how much information must be given for theinstruction, the addressing modes used, and the location of data to be processed. The genericinstruction contains the following:BYTE 1: This is the operation (OP) code for the instruction. Bit position 0 may be interpreted asthe "w" bit, where w=0 indicates byte mode and w=1 indicates word mode. Also, bit position 1may be interpreted as the operation direction bit in double operand instructions as follows:

d Direction of Operation

0

Register/Memory <- Register quot;reg"field indicates source operand "mod r/m"or "mod ss index base" indicatesdestination operand

1 Register <- Register/Memory "reg" field











10 010 SS: [BP+SI+d16] DS: [EDX+d32]

10 011 SS: [BP+DI+d16] DS: [EBX+d32]

10 100 DS: [SI+d16] sib is present

10 101 DS: [DI+d16] SS: [EBP+d32]

10 110 SS: [BP+d16] DS: [ESI+d32]

10 111 DS: [BX+d16] DS: [EDI+d32]

16-Bit Reg, w=0 16-Bit Reg, w=1 32-Bit Reg, w=0 32-Bit Reg, w=1

11 000 AL AX AL EAX

11 001 CL CX CL ECX

11 010 DL DX DL EDX

11 011 BL BX BL EBX

11 100 AH SP AH ESP

11 101 CH BP CH EBP

11 110 DH SI DH ESI

11 111 BH DI BH EDIBYTE 4 (optional): This is the "sib" byte and is not found in the 8086. It appears only in some80386 instructions as needed. This byte supports the "scaled index" addressing mode. Bit positions 0-2 identify a general register to be used as a base value. Bit positions 3-5 identify ageneral register which contains an index register. Bit positions 6 and 7 identify a scaling factor to be used to multiply the value in the index register as follows:

ss Scale Factor

00 1

01 2

10 4

11 8

The index field of the sib byte is interpreted as follows:

Index Index Register











000 EAX

001 ECX

010 EDX

011 EBX

100 No index register used

101 EBP

110 ESI

111 EDI

The mod field of the mod r/m byte taken with the base value of the sib byte generates the

following scaled indexing modes:

Mod base Effective Address

00 000 DS: [EAX + (scaled index)]

00 001 DS: [ECX + (scaled index)]

00 010 DS: [EDX + (scaled index)]

00 011 DS: [EBX + (scaled index)]

00 100 SS: [ESP + (scaled index)]

00 101 DS: [d32 + (scaled index)]

00 110 DS: [ESI + (scaled index)]

00 111 DS: [EDI + (scaled index)]

01 000 DS: [EAX + (scaled index) + d8]

01 001 DS: [ECX + (scaled index) + d8]

01 010 DS: [EDX + (scaled index) + d8]

01 011 DS: [EBX + (scaled index) + d8]

01 100 SS: [ESP + (scaled index) + d8]

01 101 SS: [EBP + (scaled index) + d8]











P/PE Parity/Parity Even 1010

NP/PO No Parity/Parity Odd 1011

L/NGE Less Than/Not Greater or Equal 1100

NL/GE Not Less Than/Greater or Equal 1101

LE/NG Less Than or Equal/Not Greater Than 1110

NLE/G Not Less Than or Equal/Greater Than 1111

Pentium

About the Pentium Architecture------------------------------

-- It is not a load/store architecture.

-- The instruction set is huge! We go over only a fraction ofthe instruction set. The text only presents a fraction.

-- There are lots of restrictions on how instructions/operands are put together, but there is also an amazing amount of flexibility.

Registers---------

The Intel architectures as a set just do not have enough registersto satisfy most assembly language programmers. Still, the processorshave been around for a LONG time, and they have a sufficient numberof registers to do whatever is necessary.

For our (mostly) general purpose use, we get

32-bit 16-bit 8-bit 8-bit(high part of 16) (low part of 16)

EAX AX AH ALEBX BX BH BLECX CX CH CLEDX DX DH DL











of a register.

Example instruction:

mov eax, [esp + 4]

The second operand uses base displacement mode. The instructioncontains a constant. That constant is added to the contentsof register esp to form an effective address. The contentsof memory at the effective address are copied into register eax.

-- base-indexed mode -- (Intel's name)The effective address is the sum of the contents of two registers.


mov eax, [esp][esi]

The contents of registers esp and esi are added to form aneffective address. The contents of memory at the effectiveaddress are copied into register eax.

Note that there are restrictions on the combinations of registersthat can be used in this addressing mode.

-- PC relative mode --The effective address is the sum of the contents of the PC and

a constant contained within the instruction.


jmp a_label

The contents of the program counter is added to an offset thatis within the machine code for the instruction. The resultingsum is placed back into the program counter. Note that from theassembly language it is not clear that a PC relative addressingmode is used. It is the assembler that generates the offsetto place in the instruction.

Instruction Set----------------











Generalities:-- Many (most?) of the instructions have exactly 2 operands.

If there are 2 operands, then one of them will be requiredto use register mode, and the other will have no restrictionson its addressing mode.

-- There are most often ways of specifying the same instructionfor 8-, 16-, or 32-bit oeprands. I left out the 16-bit onesto reduce presentation of the instruction set. Note thaton a 32-bit machine, with newly written code, the 16-bit formwill never be used.

Meanings of the operand specifications:reg - register mode operand, 32-bit registerreg8 - register mode operand, 8-bit register

r/m - general addressing mode, 32-bitr/m8 - general addressing mode, 8-bitimmed - 32-bit immediate is in the instructionimmed8 - 8-bit immediate is in the instructionm - symbol (label) in the instruction is the effective address

Data Movement-------------

mov reg, r/m ; copy data

r/m, regreg, immedr/m, immed

movsx reg, r/m8 ; sign extend and copy data

movzx reg, r/m8 ; zero extend and copy data

lea reg, m ; get effective address(A newer instruction, so its format is much restrictedover the other ones.)

EXAMPLES:

mov EAX, 23 ; places 32-bit 2's complement immediate 23; into register EAX

movsx ECX, AL ; sign extends the 8-bit quantity in register; AL to 32 bits, and places it in ECX

mov [esp], -1 ; places value -1 into memory, address given











; by contents of esplea EBX, loop_top ; put the address assigned (by the assembler)

; to label loop_top into register EBX

Integer Arithmetic------------------

add reg, r/m ; two's complement additionr/m, regreg, immedr/m, immed

inc reg ; add 1 to operandr/m

sub reg, r/m ; two's complement subtractionr/m, regreg, immedr/m, immed

dec reg ; subtract 1 from operandr/m

neg r/m ; get additive inverse of operand

mul eax, r/m ; unsigned multiplication; edx||eax <- eax * r/m

imul r/m ; 2's comp. multiplication; edx||eax <- eax * r/m

reg, r/m ; reg <- reg * r/mreg, immed ; reg <- reg * immed

div r/m ; unsigned division; does edx||eax / r/m; eax <- quotient; edx <- remainder

idiv r/m ; 2's complement division; does edx||eax / r/m; eax <- quotient; edx <- remainder

cmp reg, r/m ; sets EFLAGS based on











r/m, immed ; second operand - first operandr/m8, immed8r/m, immed8 ; sign extends immed8 before subtract

EXAMPLES:

neg [eax + 4] ; takes doubleword at address eax+4; and finds its additive inverse, then places; the additive inverse back at that address; the instruction should probably be; neg dword ptr [eax + 4]

inc ecx ; adds one to contents of register ecx, and; result goes back to ecx

Logical-------

not r/m ; logical not

and reg, r/m ; logical andreg8, r/m8

r/m, reg

r/m8, reg8r/m, immedr/m8, immed8

or reg, r/m ; logical orreg8, r/m8

r/m, regr/m8, reg8

r/m, immedr/m8, immed8

xor reg, r/m ; logical exclusive orreg8, r/m8

r/m, regr/m8, reg8

r/m, immedr/m8, immed8

test r/m, reg ; logical and to set EFLAGS











r/m8, reg8r/m, immedr/m8, immed8

EXAMPLES:

and edx, 00330000h ; logical and of contents of register; edx (bitwise) with 0x00330000,; result goes back to edx

Floating Point Arithmetic-------------------------Since the newer architectures have room for floating pointhardware on chip, Intel defined a simple-to-implementextension to the architecture to do floating point arithmetic.In their usual zeal, they have included MANY instructions todo floating point operations.

The mechanism is simple. A set of 8 registers are organizedand maintained (by hardware) as a stack of floating pointvalues. ST refers to the stack top. ST(1) refers to the

register within the stack that is next to ST. ST and ST(0)are synonyms.

There are separate instructions to test and compare the valuesof floating point variables.

finit ; initialize the FPU

fld m32 ; load floating point valuem64ST(i)

fldz ; load floating point value 0.0

fst m32 ; store floating point valuem64ST(i)










fstp m32 ; store floating point valuem64 ; and pop STST(i)

fadd m32 ; floating point addition

m64ST, ST(i)ST(i), ST

faddp ST(i), ST ; floating point addition; and pop ST

I/O---

The only instructions which actually allow the reading andwriting of I/O devices are priviledged. The OS must handlethese things. But, in writing programs that do somethinguseful, we need input and output. Therefore, there are somesimple macros defined to help us do I/O.

These are used just like instructions.

put_ch r/m ; print character in the least significant; byte of 32-bit operand

get_ch r/m ; character will be in AL

put_str m ; print null terminated string given; by label m

Control Instructions--------------------These are the same control instructions that all started withthe character 'b' in SASM.

jmp m ; unconditional jump jg m ; jump if greater than 0 jge m ; jump if greater than or equal to 0 jl m ; jump if less than 0




advanced microprocessors assemblers and loaders

Documents