80x86 assembly language

Upload: ramon-rodriguez

Post on 04-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 80x86 Assembly Language

    1/6

    80x86 Assembly Language -

    DOS COM File A COM File is the simplest type of program that you can write for DOS. You can run COM filesfrom the early versions of DOS, right through to Windows 98 or in a DOS Box on Windows NT. It issimply a binary image of the program. It is always loaded at offset 100h in a segment of theoperating systems choice. This segment depends on what other programs are already running, andwhat is already using memory. When the operating system loads a COM file, it allocates all ofavailable memory to the COM file (Within the first 640k). This is because, back in the days of DOS,only one program would run at any one time. When the COM file ends, control is passed back to theoperating system, and the memory is released, and is available once more for other programs touse. Note that DOS is a non-intrusive operating system, which basically means that if there is anerror in your COM file, control may not return to the operating system, and the only way out is toreboot the machine. This happens quite frequently when you start programming assembly (Knownas the machine 'hanging') This is fine of course in windows, as it is a multitasking operating system,and the only part that needs to be shutdown is the Dos-box, and not the whole computer. For thisreason, writing programs in windows as opposed to DOS is sometimes easier.

    The first 100h bytes of the segment is called the Program Segment Prefix (PSP), and containsinformation that may be useful to the program, and some data areas that are used by the operatingsystem. For example, the command line is stored in the PSP, so to get any command linearguments from a COM file is simply a case of reading the appropriate part of the PSP.

    Before writing your first COM file, it is worth understanding the language that you will be using.Assembly language is basically the lowest level language that you can get. An assembly languageinstruction, when assembled, is directly coded into a sequence of bytes. This direct coding givesyou the power to do exactly what you want. High level languages, such as 'c', or Basic, take oneline of High level code, and generate hundreds of assembly language instructions. This makes highlevel languages easy and quick to write code, but the output is un-optimised, and usually large andslow. For these reasons, the easiest option is assembly language. To write assembly language, youneed an assembler. One of the best available is Borlands Turbo Assembler. It costs about 50 new,but you may be able to find an old version on the internet.

    First of all, I will present a 'Hello world' COM File. It's purpose will be just to display 'Hello World' atthe command prompt, and then to return control to the operating system. Here is the source:

    Cseg Segment ; Call our segment 'cseg'

    org 100h ; All COM Files are loaded at 100h.

    start: ; 'start' is a label. It points to where our code starts.

    mov ah, 9 ; Let the 8bit register 'AH' equal 9

    ; (Function 9, Interrupt 21h is the 'Print string' command)

    mov dx, offsetHello

    ; Let the 16Bit register 'DX' equal the address in memory of'hello'

    ; (This is our string)int 21h ; Call Interrupt 21h

    ; ah = 9 (Print string)

    ; dx = address of string to print (See Interrupts)

    int 20h ; Call Interrupt 20h (This terminates a COM file).

    Hello db 'Hello world!$' ; Here is our string. Note that this particular DOS function uses

    ; '$' to mark the end of the string. Do not remove the '$', as then

  • 7/29/2019 80x86 Assembly Language

    2/6

    ; DOS will not know where the string ends, and will printgarbage!

    Cseg Ends ; Mark the end of the segment

    end start ; Signify end of program (start was defined as a label at the start

    ; of our code - see above)

    As you can see, the actual source is very short. (Everything after a ';' is a comment). If youtype the above in, and save as 'hello.asm' (Assembly source is usually .asm), then you have

    just written a hello world program. Now... To assemble the source into an actual COM file, weneed to do the following...

    Tasm helloTlink hello /t

    There will now be a hello.com file in the directory. Type 'hello', and you will see it run! The '/t'means assemble to a COM file (You can look up all the command line options in the TASMmanuals or just type tlink and it will tell you most of them).

    So what's in this PSP then?As I said above, the Program Segment Prefix is where a Program can get it's command

    line arguments, if there were any. It also holds some other information:

    Address Size Description

    0000 WordMachine code Int 20h Instruction. You can jump

    here to terminate.

    0002 WordTop of memory in segment form (Amount of

    memory available to this program.

    0004 Byte Reserved for DOS (Usually 0)

    0005 5BytesMachine code for long call to CP/M function

    dispatcher (Old stuff!)

    000a Dword INT 22h terminate address

    000e Dword INT 23h Ctrl-Break address

    0012 Dword INT 24h Critical error exit address

    0016 Word Parent process segment addr (Undocumented)

    0018 20Bytes File handle array

    002c Word Segment address of environment block

    002e Dword SS:SP on entry to last Int 21 (Undocumented)0032 Word Handle array size (Undocumented)

    0034 Dword Handle array pointer (Undocumented)

    0038 Dword Pointer to previous PSP

    003c 20Bytes Unused in DOS before 4.01

    0050 3BytesDOS function dispatcher (CDh 21h CBh) machine

    code for int 21h, ret

  • 7/29/2019 80x86 Assembly Language

    3/6

    0053 9Bytes Unused

    005c 36Bytes Default unopened FCB #1

    006c 20Bytes Default unopened FCB #2

    0080 ByteCount of characters in command tail (Also default

    DTA)

    0081 127BytesAll characters entered after the program name

    followed by a CR byte.

    As you can see from the above table, much of the contents of the PSP is reserved, orundocumented and used by DOS. The main areas of interest, are the Command linearguments at offset 0080, and the address of the Environment block. (This is whereprograms may pick up any DOS variables from - Path, Comshell, or any program specificvariables - eg. SET BLASTER=A220 I5 etc. would be stored in the environment segment).

    OK, so now say we want to expand the above program. One easy way to expand it is thatinstead of just printing out a fixed message 'Hello world', it prints out the command line. So

    if we assemble the program to hello.com, and type 'hello Jimmy' we could get the response'You typed in a command line of: Jimmy' So, lets see the program for it...

    Cseg segment

    assume cs:cseg, ds:cseg, es:cseg

    org 100h

    start:

    xor cx, cx ; cx = 0

    mov cl, byte ptr ds:[80h] ; Get length of command tail from PSP

    cmp cl, 0

    je nocmln ; There isn't a command line - exit!

    mov si, 81h ; si = source address (Command line)

    mov di, offset inserth ; di = destination address (Our buffer)

    cld ; Clear direction flag (= forwards)

    rep movsb ; Move the memory (cx bytes from si to di)

    mov byte ptr [di], '$' ; Put a '$' string at the end so that DOS

    ; Knows where the end is.

    mov ah, 9

    mov dx, offset Hello

    int 21h ; Print the string

    nocmln: int 20h ; Exit

    Hello db 'You typed in a command line of:'

    inserth db 128 dup(0) ; A 128 byte buffer filled with 0's

    Cseg ends

    end start

    A few things to notice about the above program. Firstly, the use of 'rep movsb'. This is asingle instruction, yet it can move a lot of data around. It will move 'cx' bytes of data from

  • 7/29/2019 80x86 Assembly Language

    4/6

    ds:si to es:di. There are also more 'string' instructions, that can be used to compare strings,and other useful tasks. Each one can be used in byte, word, or with the 80386 and above,double word mode.Here are the string instructions (By the way, the rep at the start is optional. It means 'repeatcx times' If you omit the rep, it will only do it once - move one byte or whatever)

    movsb movsw movsd Move some memory around (Byte/Word/Dword variations)cmpsb cmpsw cmpsd Compare 2 areas of memory

    stosb stosw stosd Store AL/AX/EAX in memory (Used to init arrays to 0, etc)

    scasb scasw scasdFind a match for AL/AX/EAX in memory (eg. find end of asciizstring)

    Another thing to notice is the use of dup. This is a useful assembler instruction that canbe used to allocate large buffers. The syntax is 'Length dup(Value)'. If you want words ordwords, do the following

    dw 100 dup(3) ; 100 Words, all with the value 3dd 3 dup(0) ; 3 Dwords, initialised to 0.

    My First COM File TSR

    Now that we have the basics of COM File programming sorted out, sometimes it is fun,to keep your program resident after it exits. You might think 'Why? How will it get to doanything? It has returned control to DOS'.

    The name for a program that remains resident and active after it has terminated is a'Terminate - stay resident' program, or 'TSR' - hence the title above.

    The way these programs stay active after returning control to DOS is to hook an

    Interrupt. The other way to stay active is to be a device driver (See SYS files after you haveread this). Is this case, DOS may call you explicitly to perform a task.

    DOS provides 2 functions that we can use to hook Interrupts. They are 'Get InterruptVector' and 'Set Interrupt Vector'. The other thing that you have to do, is tell DOS not tooverwrite your program after you return control to the operating system. There are a fewways you can do this, but the easiest is by using Int 27h.

    If you are not familiar with Interrupts, let me explain a few basics. There is an interrupttable, which simply holds addresses. When the CPU executes an Interrupt instruction, theaddress is fetched from this table, and execution continues from that address. When theInterrupt finishes, control is returned to the instruction after the original Interruptinstruction. To hook an interrupt, all we do, is make a copy of the address in the table, thenset the address in the table to point to our code. When our code is finished, we jump to theold address in the table, so that the old Interrupt handler is still called.

    As well as Software Interrupts (Called by a program using Int instruction), there arehardware Interrupts. These are called by other chips and hardware, to tell the CPU thatsomething has happened. They are usually referred to as IRQs. Two of the most useful areIRQ 0, which is the Timer Interrupt, and IRQ 1, which is the keyboard interrupt.

    IRQ 0 can be set to execute periodically by the timer chip, and in DOS, the chip is setupto interrupt approximately 18.2 times per second. This interrupt is how the DOS clock works.There is a handler here, which simply updates the DOS Clock. As it is called 18.2 times asecond, it adds 1/18.2 seconds to the current time (It's a little bit of a more complexalgorithm but anyway).

    IRQ 1 is the keyboard interrupt, and is called each time something happens on the

  • 7/29/2019 80x86 Assembly Language

    5/6

    keyboard (Key pressed, Key released, Key repeat). This is the Interrupt you trap if you wantto have a popup key sequence.

    As standard, IRQ 0 is actually mapped to Software Interrupt 8, IRQ 1 is mapped toSoftware Interrupt 9, etc. This mapping is changeable, but in DOS, is pretty fixed at thismapping.

    Right... Enough of my boring comentary, lets look at the skeleton for a COM TSRProgram.

    Cseg segment

    assume cs:cseg, ds:cseg,es:cseg

    ; Assume that cs, ds and es are all in this segment.

    org 100h ; Loads at 100h

    start: cli ; Disallow any Hardware Interrupts

    mov ax, 3508h; Dos function 35h - Get Interrupt vector AL = Int8h

    int 21h ; Get address for Interrupt 8 into es:bx

    mov old_seg, es ; Save the segmentmov old_off, bx ; Save the offset

    mov ax, 2508h ; Dos function 25h - Set interrupt vector AL = Int 8

    mov dx, offset myint8 ; ds:dx points to my interrupt handler

    int 21h ; Set address for Interrupt 8 handler

    sti ; We can allow Hardware Interrupts now.

    mov dx, offset endofprog ; dx points to the last address used by our TSR

    int 27h ; TSR. This expects dx to be the last address used.

    myint8:; *** This is our Interrupt handler. This piece ofcode will

    ; *** Get called 18.2 times a second.; *** VERY IMPORTANT:

    ; *** If you use a register here, you mustsave+restore it

    ; *** Otherwise, the computer might crash.

    push es

    push bx ; Save es and bx, as we use them.

    mov bx, 0b800h; bx = 0b800h (Segment of the video buffer in textmodes)

    mov es, bx ; es = 0b800h (Segment of video buffer>

    xor bx, bx ; bx = 0

    inc byte ptr es:[bx]; Increment the memory at b800:0000 (1st characterascii value)

    pop bx

    pop es ; Restore the registers to their original values

    push cs:old_seg; note that DS is not set to our segment, so we use a'cs'

  • 7/29/2019 80x86 Assembly Language

    6/6

    push cs:old_off; override. We could push and pop ds, and set it upto point

    retf; to our segment, but we are only using 2 bits ofdata.

    ; a RETF takes two words off the stack, and passes

    ; control there.; It is like a far jump, but here, we can push thevalues

    ; of the address we want to jump to. In this case, theold

    ; Interrupt handler.

    old_off dw 0

    old_seg dw 0 ; Address of old handler.

    endofprog: ; Label to mark end of our program.

    Cseg ends

    end start

    The above will compile using the same tasm tlink commands as the rest of the COM Files.When you run it, from a standard color text mode dos prompt, you will see that the top leftcharacter immediatelt starts cycling around all the possible ascii characters. Even when youare typing at the DOS prompt, or using edit. It will carry on. This is because it hassuccessfully hooked the timer Interrupt, and is now being called 18.2 times a second. If youwant to experiment a bit, try changing the interrupt number to 9. Then, the character will onlychange when something happens on the keyboard.

    To actually find out what happened on the keyboard, you need to comunicate with thekeyboard. This is quite easy, as the keyboard puts a code in an I/O port. By simply inputtingfrom this port, you can read the code. The code is actually a scan code. Each key on thekeyboard has a number assigned to it, and so to check which key has been pressed, just

    compare with the number. The highest bit of the code signifies if the key has been pressedor released. 0 for this bit means it's been pressed, a 1 means it has been released. You caneasily monitor keypresses, and look for pop-up key sequences etc.

    Obviously, you could use the above technique to hook more than one interrupt. Say youwant to monitor for a pop-up sequence, but also do some work in the background (playmusic for example), you should hook IRQ 0 and IRQ 1. Just remember to save and restore allregisters that you use in your interrupt handler, and that DS will not be set up, so if you tryand reference a piece of your data, set up ds yourself (Save and restore as well), or use thecs: override to say 'It's in the same segment as this code'.