nasm

Code Generation : Assemblersyntax

NASM

The NASM assembler is an open source project to develop a Net-wide Assembler. The assembler is included as standard inmost Linux distributions and is available for download to run under Windows. It provides support for the full Intel and AMD SIMDinstruction-sets and also recognises some extra MMX instructions that run on Cyrix CPUs. NASM provides support for multipleobject module formats from the old MS-DOS com files to the obj and elf formats used under Windows and Linux. If you areprogramming in assembler, NASM provides a more complete range of instructions, in association with better portability betweenoperating systems than competing assemblers. Microsoft’s MASM assembler is restricted to Windows. The GNU assembler,as, runs under both Linux and Windows, but uses non-standard syntax which makes it awkward to use in conjunction with Inteldocumentation.

It is beyond the scope of this course to provide acomplete guide to assembler programming for theIntel processor family. Readers wanting a generalbackground in assembler programming should con-sult appropriate text books in conjunction with theprocessor reference manuals published by Intel andAMD .

General instruction syntax

Assembler programs take the form of a sequenceof lines with one machine instruction per line. Theinstructions themselves take the form of an optionallabel, an operation code name conditionally followedby up to three comma separated operands. For ex-ample:

l1: SFENCE ; 0 operand instruction

PREFETCH [100] ; 1 operand instruction

MOVQ MM0,MM1 ; 2 operand instruction

PSHUFD XMM1,XMM3,00101011b ; 3 operand instruction

As shown above, a comment can be placed on anassembler line, with the comment distinguished fromthe instruction by a leading semi-colon. The label, ifpresent is separated from the operation code name

General instruction syntax

by a colon.

Case is significant neither in operation code namesnor in the names of registers. Thus prefetch isequivalent to PREFETCH and mm4 equivalent to MM4.

In the NASM assembler, as in the original Intel as-sembler, the direction of assignment in an instructionfollows high level language conventions. It is alwaysfrom right to left1, so that

MOVQ MM0,MM4

is equivalent to

MM0:=MM4

and

ADDSS XMM0,XMM3

is equivalent to1If you chose to use the GNU assembler as, instead of NASM you should be aware that this follows the opposite convention of left to right

assignment. This is a result of as having originated as a Motorola assembler that was converted to recognise Intel opcodes. Motorolafollow a left to right assignment convention.

XMM0:= XMM0 + XMM3

Operand forms

Operands to instructions can be constants, registernames or memory locations.ConstantsConstants are values known at assembly time, and

take the form of numbers, labels, characters or arith-metic expressions whose components are themselvesconstants.

Constant forms

The most important constant values are numbers.Integer numbers can be written in base 16, 10, 8 or2.

mov al,0a2h ; base 16 leading zero required

mov bh,$0a2 ; base 16 alternate notation

mov cx,0xa2 ; base 16 C style

add ax,101 ; base 10

mov bl,76q ; base 8

xor ax,11010011b ; base 2

Floating constants

Floating point constants are also supported as operandsto store allocation directives :

dd 3.14156

dq 9.2e3

It is important to realise that due to limitations of theAMD and Intel instruction-sets, floating point constantscan not be directly used as operands to instructions.Any floating point constants used in an algorithm haveto be assembled into a distinct area of memory andloaded into registers from there.

Labels

Constants can also take the form of labels. As theassembler program is processed, NASM allocates aninteger value to each label. The value is either theaddress of the operation-code prefixed by the instruc-tion or may have been explicitly set by an EQU direc-tive:

Fseek equ 23

Fread equ 24

We can load a register with the address refered to bya label by including the label as a constant operand:

mov esi, sourcebuf

Using the same syntax we can load a register with anequated constant:

Labels

mov cl, fread

Constant expressions

Suppose there exists a data-structures for which onehas a base address label, it is often convenient to beable to refer to fields within this structure in terms oftheir offset from the start of the structure. Considerthe example of a vector of 4 single precision float-ing point values at a location with label myvec. Theactual address at which myvec will be placed is de-termined by NASM, we do not know it. We may knowthat we want the address of the 3rd element of thevector:

mov esi, myvec + 3 *4

will place the address of this word into the esi register.

Constant expressions

NASM allows one to place arithmetic expressionswhose sub-expressions are constants wherever a con-stant can occur. The arithmetic operators are writtenC style as shown below.

operator means operator means| or + add^ xor - subtract& and * multiply<< shift left / signed division>> shift right // unsigned division% modulus %% unsigned modulus

Registers

You should be aware that in the Intel architecturea number of registers are aliased to the same statevectors, thus for example the eax, ax, al, ah reg-isters all share bits. More insidiously the floating pointregisters ST0..ST7 not only share state with the MMXregisters, but their mapping to these registers is dy-namic and variable.

register classes

byte word dword float nnx ssenumber reg reg reg reg reg reg

Aliased Aliased0 al ax eax st0 mm0 xmm0

1 cl bx ecx st1 mm1 xmm1

2 dl cx edx st2 mm2 xmm2

3 bl bx ebx st3 mm3 xmm3

4 ah sp esp st4 mm4 xmm4

5 ch bp ebp st5 mm5 xmm5

6 dh si esi st6 mm6 xmm6

7 bh di edi st7 mm7 xmm7

Memory Locations

Memory locations are syntactically represented bythe use of square brackets around an address ex-pression thus: [100], [myvec], [esi] all rep-resent memory locations.The address expressions, unlike constant expres-

sions, can contain components whose values are notknown until program execution. The final exampleabove refers to the memory location addressed bythe value in the esi register, and as such, dependson the history of prior computations affecting that reg-ister.

Examples

byte[edx]

byte pointed to by edx

dword[edx+10]

4 byte word pointed to by edx+10

dword[edx+esi+34]

4 byte word at the address given by the sum of theesi and edx registers +34

word[eax+edi*4+200]

2 byte word at the address given by the eax register+ 4* edi register + 200The format qualifiers are used to disambiguate the

size of an operand in memory where the combina-tion of the operation code name and the other non-memory operands are insufficient so to do.

Sectioning

Programs running under Linux have their memorydivided into 4 sections:

text is the section of memory containing oper-ation codes to be executed. It is typicallymapped as read only by the paging sys-tem.

data is the section of memory containing ini-tialised global variables, which can be al-tered following the start of the program.

bss is the section containing uninitialsed globalvariables.

stack is the section in which dynamically allo-

Sectioning

cated local variables of subroutines arelocated.

The section directive is used by assembler programersto specify into which section of memory they wantsubsequent lines of code to be assembled. For ex-ample in the listing shown in algorithm 1 we dividethe program into three sections: a text section con-taining myfunc, a bss section containing 64 unde-fined bytes and a data section containing a vector of4 integers.

The label myfuncbase can be used with negativeoffsets to access locations within the bss, wilst thelabel myfuncglobal can be used with positive off-sets to access elements of the vector in the data sec-

Algorithm 1 Examples of the use of section and data reservation directivessection .text

global myfuncmyfunc:enter 128,0; body of function goes here

leaveret 0

section .bssalignb 16resb 64 ; reserve 64 bytes

myfuncBase:section .datamyfuncglobal: ; reserve 4 by 32-bit integers

dd 1dd 2dd 3dd 5

Data reservation

Data must be reserved in distinct ways in the differ-ent sections. In the data section, the data definitiondirectives db, dw, dd, and dq are used to definebytes, words, doublewords and quad words. The di-rective must be followed by a constant expression.When defining bytes or words the constant must bean integer. Doublewords and quadwords may be de-fined with floating point or integer constants as shownpreviously.In the bss section the directive resb is used to re-

serve a specified number of bytes, but no value isassociated with these bytes.

Stack data

Data can be allocated in the stack section by useof the enter operation code name. This takes theform:enter space, level

It should be used as the first operation code name ofa function. The level parameter is only of relevancein block structured languages and should be set to0 for assembler programming. The space parame-ter specifies the number of bytes to be reserved forthe private use of the function. Once the enter in-struction has executed, the data can be accessed atnegative offsets from the ebp register.

Releasing stack space dynamically

The last two instructions in a function should, asshown in algorithm be

leave

ret 0

The combined effect of these is to free the space re-served on the stack by enter, and pop the return ad-dress from the stack. The parameter to the operationcode name ret is used to specify how many bytesof function parameters should be discarded from thestack. If one is interfacing to C this should always beset to 0.

Label qualification

The default scope of a label is the assembler sourcefile containing the line it prefixes. But labels can beused to mark the start of functions that are to becalled from C or other high level languges. To indicatethat they have scope beyond the current asscemblerfile, the global directive should be used as shownin algorithm 1.The converse case, where an assembler file calls a

function exported by a C program is handled by theetern directive:extern printreal

call printreal

Label qualification

in the above example we assume that printrealis a C function called from assembler.

Linking and object file formats

There are 4 object file formats that are commonlyused on Linux and Windows systems as shown intable ??. This lists the name of the format, its fileextension - which is often ambiguous and the combi-nation of operating system and compiler that makesuse of it. A flag provided to Nasm specifies which for-mat it should use. We will only go into the use of thegcc compiler, since this is portable between Windowsand Linux.

Let us assume we have a C program called c2asm.cand an assembler file asmfromc.asm. Suppose wewish to combine these into a single executable mod-ule c2asm. We issue the following commands at theconsole:

nasm -felf -o asmfromc.o asmfromc.asm

gcc -oc2asm c2asm.c asmfromc.o

This assumes that we are working either under Linuxor under Cygwin. If we are using djgpp we type:

nasm -fcoff -o asmfromc.o asmfromc.asm

gcc -oc2asm c2asm.c asmfromc.o

Format Extension Operating System C++ Compilerwin32 .obj Windows Microsoft C++

obj .obj Windows Borland C++coff .o Windows Djgpp gcc.elf .o Windows Cygwin gcc.elf .o Linux gcc

Leading underbars

If working with djgpp all external labels in your pro-gram, whether imported with extern or imported withglobal must have a leading underbar character. Thusto call the C procedure printreal one would write:extern _printreal

call _printreal

whilst to export myfunc one would write

global _myfunc

_myfunc:enter 128,0

Useful x86 instructions

This is a very small subset of the available instruc-tions but should be enough for your purposes.

Data movement

mov mem, reg/litexample

mov [ebp+12],eax

mov dword[esp+4],12

meansstore the right operand in the memory location on

the left

mov reg, mem/reg/litexample

mov ebx,1

mov ecx,[esi+ebp]

mov eax,ebx

meansload the right operand into the register on the left

movss mem, regexample

movss [ebp+12],xmm0


the left. The right operand is the bottom 32 bits of anxmm register.

movss reg, mem/regexample

movss xmm1,[esi+ebp]

movss xmm1,xxm2

meansload the right operand into the register on the left,

the left operand is the lower 32 bits of a xmm registerand the data should be a 32 bit float

movups

movups mem, regexample

movss [ebp+12],xmm0


the left. The right operand is a 128 bit xmm register.

movss

movss reg, memexample

movups xmm1,[esi+ebp]

meansload the right operand into the register on the left,

the left operand is a 128 bit xmm register

push

push mem/reg/litexample

push dword 10

push dword[esi+ebp+40]

push ecx

meanspush the operand on stack, pre-decrementing the

esp register by 4

pop

pop mem/regexample

pop dword[esi+ebp+40]

pop ecx

meansthe operand is assigned the value on the top of stack

and the stack pointer is then incremented by 4

fld

fld memexample

fld dword[esi+ebp+40]

meansthe operand which is assumed to be a 32 bit floating

point value is pushed on the fpu stack

fild

fild memexample

fild dword[esi+ebp+40]

meansthe 32bit integer operand is pushed on the fpu stack

as a floating point number

fstp

fstp memexample

fstp dword[esi+ebp+40]

meansthe operand is assigned the 32bit floating point value

on the fpu stack the fpu stack is then popped

fistp

fistp memexample

fistp dword[esi+ebp+40]

meansthe 32bit floating point value on the fpu stack is con-

verted to an integer and stored in the operand, thefpu stack is then popped.

Arithmetic

Integer arithmetic instructions can be divided into 3classes

1. Add, subtract, and, or, xor.These are treated ab-solutely regularly as two operand instructions asshown below in section ??.

2. Multiply, this comes in both 2 and 3 operand forms.

3. Divide and Modulus, these are irregular and makeuse of specific registers

Regular integer arithmetic

These take the formoperation dest, srcand mean dest:= dest operation srcthe following operation codes are allowed

add, sub, and, or, xor

The table shows the allowed combinations of desti-nation and sourceOperand combinations for regular arithmetic

dest srcregister registerregister constantregister memorymemory registermemory constantExamples

add esp,5

sub eax, ebx

and [eax+12],ebp

add dword[esi+edi],1

add esi,[edi]

Multiply

imul reg,reg/memThis is functionally the same as the regular 2 operand

integer arithmetic instructions.Example

imul ebx, dword [ebp-26]

imul reg,reg,constThis three operand form is particularly useful for com-

puting array offsets.Example

imul esi,eax,16

Divide/modulus

A single instruction is used for both division and mod-ulus.idiv reg/memThe 64 bit value in edx:eax is divided by the operand,

the quotient is placed in eax, and the remainder isplaced in edx.Example

idiv [ebp+64]

Floating point arithmetic

The floating point stack can be used to perform arith-metic in a postfix manner. The following fpu opcodesoperate on the top two items on the fpu stack:

faddp st1, fsubp st1, stafdivp st1

fmulp st1

These perfrom an operation between the top of thefpu stack (st0) and st1, store the result in st1, thenpop the stack so that st1 becomes the new top ofstack. Bear in mind that the maximum depth of thefpu stack is 8. Operations use 80bit internal floatingpoint.

Vector arithmetic

It is possible to perform parallel operations on vec-tors of 32 bit floats using the xmm registers. Theseinstructions have the general formatoperationPS xmmreg,xmmregFor example

mulps xmm0,xmm5

the suffix PS stands for Packed Single precison floats.In this case the 4 floats in xmm0 are multiplied by thecorresponding floats in xmm5 and the result stored inxmm0.

The other useful vector arithmetic instructions in thiscontext are:

addps, subps, divps

These instructions also exist in a memory to registerform but for these to be used you have to guaran-tee that the operands are aligned on 16 byte mem-ory boundaries. Since this is complicated to ensure,I suggest that you restrict yourself to the register toregister forms of these instructions.

Scalar arithmetic

It is also possible to perform scalar arithmetic in thelow order 32 bit words of the xmm registers. For in-stance, you can do all of the vector operations by us-ing the subscript SS standing for Scalar Single preci-sion after the operation thus:

addss xmm2,xmm0

would add the bottom 32 bit float in xmm0 to the bot-tom float in xmm2 and leave the result in xmm2.

Conversion instructions

operation dest srccvtsi2ss xmm register general registercvtsi2ss xmm register memorycvtss2si general register xmm registercvtss2si memory xmm register

If you are going to use these scalar instructions it is worth taking note of the conversion instructions cvtsi2ss and cvtss2siwhich convert signed doubleword integers to single precision floats and vice versa.

Examples

cvtsi2ss xmm4, ebx

cvtsi2ss xmm3, [ebp+20]

cvtss2si eax, xmm0

Integer comparisons

Comparison instructions exist which will place theresults of comparison in the flags. The cmp instruc-tion compares two integers.Examples

cmp eax, 12

cmp eax, ecx

cmp ebx, [ebx+16]

Set

The result of the comparison is written to the flagsand can be used either by a SET instruction or by aconditional jump instruction. For instance to test if theeax register was less then 10 we could write

cmp eax,10

setl bl

At the end of this the bl register will contain a booleanvalue of 1 if eax had been less than 10 and 0 if it hadbeen greater than 10. The suffixes used by the SETinstruction indicate which comparison is being tested.The suffixes that are most likely to be of use to youare L, G and E standing for <, >, and =.

fcomip st0,st1

The instruction fcomip compares the top two ele-ments of the floating point stack, popping the top onefrom the stack and placing the result of the compari-son in the cpu flags.It may be necessary to discard the next item on the

fpu stack using an fincstp instruction which incre-ments the floating point stack pointer.

cmpss

There are a family of comparison operations thatwork between scalar xmm registers. These leave aninteger result in one of the registers thus:

cmpltss xmm2,xmm4

would compare the float in the bottom 32 bits of xmm2with the corresponding float in xmm4 and set xmm2to all 1s if xmm2 was less than xmm4, otherwise itwould set xmm2 to zero.

Scalar comparisons

instruction meanscmpltss xmma,xmmb xmma<xmmb

cmpeqss xmma,xmmb xmma=xmmbcmpnless xmma,xmmb xmma>xmmb

comiss

Comiss is an alternative technique for performingscalar comparisons it compares the contents of twoxmm registers and returns the results in the cpu flags.Example

comiss xmm1, xmm7

Branches

Branches can be unconditional and direct:

jmp lab

or uncoditional and indirect:

jmp dword[ebp+10]

or conditional on a condition code and direct:

jl lab1

jg lab3

je lab4

Calls

Calls can be direct:

call lab

or indirect:

call dword[ebp+10]

in either case the current value of the eip registeris pushed on the stack and the eip register loadedfrom the operand. Returns are perfomed using theret instruction which pops the top of stack into theeip register.

nasm

Documents

32bit oating

oating point

xmm0 means

oc2asm c2asm

negative offsets

operation

gnu assembler

eip register