1 cs 201 computer systems programming chapter 11 “x86 microsoft assembler” herbert g. mayer, psu...

32
1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Herbert G. Mayer, PSU CS Status 11/5/2012 Status 11/5/2012

Upload: maurice-hall

Post on 02-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

1

CS 201Computer Systems Programming

Chapter 11“x86 Microsoft Assembler”

Herbert G. Mayer, PSU CSHerbert G. Mayer, PSU CSStatus 11/5/2012Status 11/5/2012

Page 2: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

2

Introductory NotesCS 200 has been eliminated from the PSU CS CS 200 has been eliminated from the PSU CS curriculum, thus assembly language curriculum, thus assembly language programming is de-emphasized. Some MS programming is de-emphasized. Some MS assembly language program will be covered assembly language program will be covered in CS 201, but focus will be reading and in CS 201, but focus will be reading and understanding understanding .asm .asm source programs.source programs.

Main assembler used here is Microsoft Main assembler used here is Microsoft macro assembler, commonly known as macro assembler, commonly known as masmmasm. A . A version can be installed from Microsoft, version can be installed from Microsoft, but requires Visual C++ 2005 be installed.but requires Visual C++ 2005 be installed.

Assembler mnemonics and symbols in Assembler mnemonics and symbols in masmmasm are somewhat different from asm source are somewhat different from asm source emitted by gcc compiler; the latter emitted by gcc compiler; the latter reminds of SPARC asm, with % identifying reminds of SPARC asm, with % identifying regs; the latter also is used in our CS regs; the latter also is used in our CS 201 text book.201 text book.

Page 3: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

3

Introductory Notes1.1. Find a downloadable masm version 8.0 Find a downloadable masm version 8.0

here: here: http://www.microsoft.com/en-us/downloadhttp://www.microsoft.com/en-us/download/details.aspx?id=12654 /details.aspx?id=12654

2.2. Or find references Microsoft to masm Or find references Microsoft to masm here: here: http://msdn.microsoft.com/en-us/libraryhttp://msdn.microsoft.com/en-us/library/afzk3475.aspx/afzk3475.aspx

Page 4: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

4

Introductory NotesAssembly Language programs bridge the gap between low level machine binary instructions and higher level interface with human programmers. The former are required to accomplish execution on a digital computer; the latter are convenient tools of expression for programmers. Assembly language is a low-level, target machine specific interface.

But assembler presents a level of abstraction. Users do not deal with the target in terms of bits that represent binary machine instructions. The assembler elevates user to the level of textual language, up from the level of binary object code.

Page 5: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

5

Introductory NotesCommon to many architectures is separation of data space, instruction space, and perhaps other areas of program logic. The x86 architecture embodies so called data segments, code segments, stack segments, and numerous of these if needed. Each segment is identified at run time by segment register.

For example, the code label next: is interpreted by the HW as seg:offset, where seg is segment register cs, and offset is the offset of next from the start of the code segment. If the offset of next is 248x and the value in the cs register is 20030x, the resulting run time (code) address is 200548x; note the left-shift of the segment address by 4 bits.

Page 6: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

6

Introductory NotesThis chapter introduces complete masm assembler source programs. Starting with the smallest possible but complete assembly program, doing nothing but asking DOS for its assisted suicide, we progress to more sophisticated programs.

One example emits a single character, the next prints a complete string onto the standard screen, followed by conventions that allow us to communicate with the assembler in an abbreviated way. We also discuss macros and simple procedures.

The “Definitions” below are in alphabetical order; we cover them in logical order, to minimize forward referencing.

Page 7: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

7

Syllabus Motivation Definitions Null Program Print Single Character Print Character String Assembler Abbreviations Assembler Macros Assembler Procs Assemble and Link References

Page 8: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

8

Motivation Almost impossible to communicate with

machine on the binary level Assembler offers a significant level of

abstraction from the machine bits, plus relocatability, symbolic names and addresses, and limited program reuse

Symbols permit easy definition and reference of data and code objects

Microsoft’s masm even offers high level constructs, similar to high-level statements

Assembler programming allows the highest level of control over the target machine

And permits to achieve highest performance -for short code sections

Page 9: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

9

Definitions Address: identifying attribute of any

distinguishable memory unit. On old x86 architecture a logical address is a pair seg : offset, translated by hardware into so called linear address. Segment and offset are 16 bits long each in real mode. The machine address, called a linear address, is 20 bits long, with the rightmost (low-order) 4 bits of a segment address implied to be 0, as a segment must be 16-byte-aligned

Alignment: Attribute of an address a, requiring that a must lie on a specified boundary; for example, the address a must be even, or must be evenly divisible by 4 or 512. The former case is also called modulo-2 alignment, the latter modulo-512 alignment. Note that aligned addresses have some (of their lower address) bits set to 0. Hence, if these addresses are stored in hardware, these 0s can be omitted, i.e. are implied, whenever the complete address is needed

Page 10: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

10

Definitions Assembler: source to object translator, reading

relocatable, abstract, machine specific source programs, translating them into binary object code. After linking, the binary code is executable

Binary Object: strings of bits which, when interpreted by the target machine, are legal machine operations plus associated memory references. Jointly, these represent executable programs

Code Segment: Subsection of an architecture’s memory which holds executable instructions with possibly embedded, immediate operands

Data Segment: Subsection of an architecture’s memory which holds data being referenced or manipulated. Like any segment, a data segment is identified by a segment register, holding its start address. Such an address must be evenly divisible by 16 on the x86 family processors. These addresses are also called paragraphs.

Page 11: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

11

Definitions Offset: Distance of a named object (addressable

unit) from the beginning of an area encompassing the name

Paragraph: Range of contiguous memory addresses that is 16 bytes long, and whose first byte address is evenly divisible by 16

Relocation: Ability of digital computer information to be placed in any location of memory. For example, referring to data (or object code) by offsets relative to some start address allows the code to be placed anywhere, as long as the respective start address is always added at execution time

Segment: A subsection of memory. It is identified by a segment register and holds either code, data, or stack space; usually has alignment constraint.

Page 12: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

12

Definitions Stack: Data structure holding data that are

accessed only in a particular way, named LIFO (last in first out). The amount of data varies over time. Increases of data are accomplished through an operation called pushing, decreases via popping (on the x86 architecture). A stack segment register points to the beginning of the stack, the base pointer to the end, and the stack pointer to the current and varying top

Top of Stack: Select element on the stack that is accessible (visible). There may be other elements in the stack, hidden by the top element. Additional elements are created by pushing, and elements are removed by popping

Page 13: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

13

Null Program Set up the program’s segments: code, data, and stack In sample below there is only a Code Segment Note the code string to identify code segment Communicate implied segment portion of seg:offset in

the assume pseudo-instruction Define start address (actually offset) via label, here

the label is named start: Label is user-defined identifier followed by colon, in

code segment Use DOS services: Here 4ch to terminate DOS services requested via INT 21h Specific service defined in register ah and possibly

other registers Return code is zero, meaning: no errors occurred Note comments, introduced by ; A comment ends at the end of a line

Page 14: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

14

Null Program; Source file: out1.asm; Author: Herb Mayer; Purpose: simple, meaningless program, no data seg, no stack; Assembler: Microsoft assembler, command "masm »; 16-bit version code_s segment 'code’

assume cs:code_s ; communicate implied seg register start: mov al, 00 ; termination code for DOS 21, 4ch

mov ah, 4ch ; tell DOS to terminate, 4ch in ahint 21h ; call DOS routine 21h for help

code_s ends ; end of [code] segment 

end start ; end’s argument defines start; sounds like Microsoft, say start to

stop

Page 15: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

15

Print Single Character Define also data and stack segment; though unused

dw 999 reserves (defines) an int word, initialized to 999; not used

dw 100 dup( 1234 ) defines 100 words, all initially 1234; not used

DOS routine 21h is called for help: INT 21h

Specify to DOS via value in ah, which type of help is needed

E.g. value 2 in ah means: output 1 character, the one in dl

So DOS routine 2 prints the character found in register dl

Moving 4c00h into register ax is same as 4ch into register ah and 00 into al

They are just two bytes (byte registers) concatenated; and this will terminate the program

Page 16: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

16

Print Single Character; Purpose: simple program to output one character; Assembler: Microsoft assembler, command "masm”; 16-bitdata_s segment ; unused data segment

dw 999 ; define a word, init to 999data_s ends stack_s segment ; unused stack segment

dw 100 dup( 0 ) ; reserve 100 words, init to 0stack_s ends code_s segment 'code' ; THE Code Segment

assume cs:code_s, ds:data_s start: mov ax, seg data_s ; initialize ds, indirectly

mov ds, axmov dl, '$' ; char literal to be output by

DOSmov ah, 2h ; DOS call 2h emits char in dlint 21h ; call DOS routine 21hmov ax, 4c00h ; we wanna terminate, ah + alint 21h ; terminate finally via DOS call

code_s ends ; repeat segment name at ends end start ; end says: Where to start

Page 17: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

17

Print Character String Data Segment defines a string of bytes, initialized

to some string literal, identified by msg

Note the $ character at the end of a string literal

Used as end criterion for DOS output routine 9

Stack segment is still dummy, holds also 10 strings, each of length 16, also unused just to show stack seg to students

DOS routine 9 emits character string terminated by ‘$’

Whose start address it finds in ds:offset, offset communicated in register dx

Note the built-in function offset applied to a data label

Masm also provides built-in seg function to generate other part of address

Page 18: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

18

Print Character String

; Purpose: simple program to output character stringdata_s segmentmsg db "Hello class$" ; note ’$’ terminationdata_s ends stack_s segment ; unused

db 10 dup( "---S t a c k----" )Stack_s ends ; repeat the name code_s segment 'code’

assume cs:code_s, ds:data_sstart: mov ax, seg data_s

mov ds, axmov dx, offset msg ; string 2 b output by DOSmov ah, 9h ; DOS call 9h emits stringint 21h ; call DOSmov ax, 4c00h ; we wanna terminate, ah + alint 21h ; terminate finally via DOS

code_s ends ; end code seg end start ; start execution here: at ‘start’

Page 19: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

19

Assembler Abbreviations Directive .mode small allows for default

abbreviations and assumptions For example data, code, stack, @data are

predefined, as are assume statements Here another string is printed, “Hello”, note

again the $ terminator The macro @data is predefined by masm, same as

seg data Note again offset function Note again DOS routine 9, to output string of

characters at address found in register dx Program using .model small abbreviation is

smaller, more compact .code ends previous segment, if any (here data)

and starts code segment .data ends previous segment, if any, and starts

data segment, etc.

Page 20: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

20

Assembler Abbreviations; Source file: out4.asm ; note: 16-bit assembler; Purpose: simpler program to output character string 

.model small ; assumes stack data code

.stack 10h ; assumes name: stack

.data ; assumes name: datahi db "Hello$" 

.code ; assumes name: codestart: mov ax, @data ; @data predefined macro

mov ds, ax ; now data segment reg set 

mov dx, offset hi ; string 2 b output by DOSmov ah, 9h ; DOS call 9h emits stringint 21h ; call DOS

 mov ax, 4c00h ; we wanna terminate, ah + alint 21h ; terminate finally

  end start ; start here, at “start”!

Page 21: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

21

Assembler Macros Tired of writing segment, and ends?

The .model small allows defaults and abbreviations

Macros make program source more readable, easier to maintain

Macro can be defined anywhere in assembler source

Introduced by user defined name and macro keyword

Terminated by endm keyword Macros may have 0 or more parameters, to

be used in macro body When macro name is used, its body is

expanded in-line at that place

Page 22: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

22

Assembler Macrosstart macro ; no parameters

movax, @data ; @data predefined macromov ds, ax ; now data segment reg setendm ; end of start macro

Put_Str macro Str ; one formal parameter, “Str”mov dx, offset Str ; string 2 b output by DOSmov ah, 9h ; DOS call 9h emits stringint 21h ; call DOSendm ; end of Put_Str macro

Done macro ret_code ; formal parameter “ret_code”mov ah, 4ch ; we wanna terminate, ah = 4cmov al, ret_code ; communicate return codeint 21h ; terminate finally via DOSendm ; end of macro body of Done.model small ; predefined assumptions.stack 10h ; assumes segment name: stack.data ; assumes segment name: data

hi db "Hello$" ; terminate string with $.code ; assumes segment name: code

main: startPut_Str hi ; invoke macro Put_Str, w. hiDone 0

end main ; start at: main!

Page 23: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

23

Assembler Procs

Assembler procedure identified by proc and endp

Procedure can be called, provides syntactic grouping mechanism to form logical modules

Syntax rule for procedure: the name does not allow ‘:’ as you saw for label

Return instruction ret ends procedure body and allows return to the place of call

Reminiscent of high-level construct

Page 24: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

24

Assembler Procs; Purpose: modular macro program to output stringstart macro ; no parameters

mov ax, @data ; @data predefined macromov ds, ax ; now data segment reg setendm ; end of “start” macro body

Put_Str macro Str ; “Str” must be data label . . . other macros as before

endm ; see earlier def of Put_Str macro.data ; assumes name: data

hi db "Hello$" ; terminate string with $.code ; assumes name: code

 main proc ; begin of procedure body

start ; invoke “start” macroPut_Str hi ; invoke “Put_Str” w. actualDone 0 ; invoke “Done” with actual 0ret ; unnecessary, unreachable

main endp  end main ; entry point is “main”

Page 25: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

25

Assemble and Link Microsoft old Macro Assembler masm 5.10

to 8.0 Borland Macro Assembler tasm Microsoft newer Macro Assembler ml 6.22 Again: Microsoft masm assembler 8.0 for

32-bit processors here: http://www.microsoft.com/en-us/download/details.aspx?id=12654

Microsoft masm for x64 here: http://msdn.microsoft.com/en-us/library/hb5z4sxd.aspx

Microsoft Linker link Borland Linker tlink

Page 26: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

26

Assemble and LinkThe Microsoft macro assembler old version (up to about 2003 with .NET 2003) is named masm. Newer assembler product from Microsoft is named ml. This section explains the masm command briefly.

Users should consult on-line help by typing masm/h to get more detailed information. The masm command version 5.10 and older has 4 arguments, separated from one another by commas. These arguments are file names. Arguments are considered omitted, if no comma (and thus no file name) is given.

The assembler prompts you for each omitted one, so it is generally better to provide them, at least the commas, lest there will be repeated interaction with the assembler asking for file names, or hitting of carriage returns.

Page 27: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

27

Assemble CommandIf commas without file names are given, then default file names are assumed. The four file names, which are the arguments of the masm command, are left to right:•assembly source program, say source.asm•object program generated by assembler, say source.obj•the listing, generated by the assembler, say source.lst•the cross-reference file, named source.crf The suffixes obj, lst, and crf are automatically generated by the assembler, if no other names are provided Some complete masm commands, for the assembler file src1.asm would be:

masm src1.asm, src.obj, src.lst, src.crf ; no promptingmasm src1,src1,src1,src1 ; no promptingmasm src1,src1.obj,src1,src1.crf ; no promptingmasm src1,,,; ; no prompting

 In the above cases the masm assembler will not prompt you, because you provided all file names. It was smart enough to provide suffixes (like .lst and .obj) from the respective positions

Page 28: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

28

ComLink CommandLink also has 4 arguments, 1 input file and 3 output files. Input is the object to be linked. The object may be a concatenation of multiple object files (typically ending in the .obj suffix), strung together by the + operator. For example:

link mem0 + putdec,,, creates an executable mem0.exe. The file name mem0 is derived from the first part of the first argument, the suffix .exe is assumed. Also, the object file putdec.obj is used as input, to resolve some of the external names used in mem0.obj. The arguments of the link command, i.e. the 4 file names, are:

•object file or object files, concatenated by + with default suffix .obj•the linked executable with suffix .exe•the load map file, whose name ends in .map•the library

Page 29: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

29

ComLink CommandIf the input file is provided without suffix then the suffix .obj is assumed. If the executable file is specified without suffix, then .exe is assumed; any other file and explicit suffix is allowable too. The file for the load map should be specified; if none is provided then the file name nul is generated by the linker. And if no suffix is provided, then the .map suffix is assumed. Similarly, for the library a file name must be specified. The suffix is .lib. The commands below do not cause the linker to prompt you for additional file name inputs, because sufficient information is allowed to be assumed:

link mem0 + putdec,,,, ; mem0.exe, no map, no librarylink mem0+putdex,foo.bar,,, ; generate executable foo.barlink putdec+mem0,mem0.exe,,, ; mem0.exe

Page 30: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

30

Link CommandNote that the concatenation operator + may be embedded in any number of blanks. Also the commas may be surrounded by blanks. The order of specifying the object files is immaterial, provided that the main entry point is unambiguous. The commands below cause the linker to prompt for some additional information:

link mem0 + putdec ; ask for executable, map, and library

link mem0+putdec,x.y ; ask for map and liblink putdec+mem0,, ; gen putdec.exe, ask for map and lib

Page 31: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

31

Main Entry PointEach assembly unit concludes with an end directive (end statement). This end statement may have a label, identifying one of the labels of proc names of the program. The such label specifies the entry point, i.e. the initial value of ip, set by the loader.

However, if an executable is composed of multiple objects, there may be only a single entry point. All other source modules should not specify an argument after their end statement. If, however, two or more object modules to be linked into an executable do have an entry points specified, masm does not complain. Instead, it takes the first one of the objects listed as the first argument in the link command. And if this is not the intended entry point, program execution will bring surprises.

Page 32: 1 CS 201 Computer Systems Programming Chapter 11 “x86 Microsoft Assembler” Herbert G. Mayer, PSU CS Status 11/5/2012

32

References

1. Free masm download: http://cvrce.blog.com/2009/08/28/masm-v611-free-download/

2. http://www.emsps.com/oldtools/msasmv.htm

3. ML 64-bit: http://msdn.microsoft.com/en-us/library/s0ksfwcf(v=vs.80).aspx