nasm
TRANSCRIPT
Code Generation : Assemblersyntax
NASM
The NASM assembler is an open source project to develop a Net-wide Assembler. The assembler is included as standard inmost Linux distributions and is available for download to run under Windows. It provides support for the full Intel and AMD SIMDinstruction-sets and also recognises some extra MMX instructions that run on Cyrix CPUs. NASM provides support for multipleobject module formats from the old MS-DOS com files to the obj and elf formats used under Windows and Linux. If you areprogramming in assembler, NASM provides a more complete range of instructions, in association with better portability betweenoperating systems than competing assemblers. Microsoft’s MASM assembler is restricted to Windows. The GNU assembler,as, runs under both Linux and Windows, but uses non-standard syntax which makes it awkward to use in conjunction with Inteldocumentation.
It is beyond the scope of this course to provide acomplete guide to assembler programming for theIntel processor family. Readers wanting a generalbackground in assembler programming should con-sult appropriate text books in conjunction with theprocessor reference manuals published by Intel andAMD .
General instruction syntax
Assembler programs take the form of a sequenceof lines with one machine instruction per line. Theinstructions themselves take the form of an optionallabel, an operation code name conditionally followedby up to three comma separated operands. For ex-ample:
l1: SFENCE ; 0 operand instruction
PREFETCH [100] ; 1 operand instruction
MOVQ MM0,MM1 ; 2 operand instruction
PSHUFD XMM1,XMM3,00101011b ; 3 operand instruction
As shown above, a comment can be placed on anassembler line, with the comment distinguished fromthe instruction by a leading semi-colon. The label, ifpresent is separated from the operation code name
General instruction syntax
by a colon.
Case is significant neither in operation code namesnor in the names of registers. Thus prefetch isequivalent to PREFETCH and mm4 equivalent to MM4.
In the NASM assembler, as in the original Intel as-sembler, the direction of assignment in an instructionfollows high level language conventions. It is alwaysfrom right to left1, so that
MOVQ MM0,MM4
is equivalent to
MM0:=MM4
and
ADDSS XMM0,XMM3
is equivalent to1If you chose to use the GNU assembler as, instead of NASM you should be aware that this follows the opposite convention of left to right
assignment. This is a result of as having originated as a Motorola assembler that was converted to recognise Intel opcodes. Motorolafollow a left to right assignment convention.
XMM0:= XMM0 + XMM3
Operand forms
Operands to instructions can be constants, registernames or memory locations.ConstantsConstants are values known at assembly time, and
take the form of numbers, labels, characters or arith-metic expressions whose components are themselvesconstants.
Constant forms
The most important constant values are numbers.Integer numbers can be written in base 16, 10, 8 or2.
mov al,0a2h ; base 16 leading zero required
mov bh,$0a2 ; base 16 alternate notation
mov cx,0xa2 ; base 16 C style
add ax,101 ; base 10
mov bl,76q ; base 8
xor ax,11010011b ; base 2
Floating constants
Floating point constants are also supported as operandsto store allocation directives :
dd 3.14156
dq 9.2e3
It is important to realise that due to limitations of theAMD and Intel instruction-sets, floating point constantscan not be directly used as operands to instructions.Any floating point constants used in an algorithm haveto be assembled into a distinct area of memory andloaded into registers from there.
Labels
Constants can also take the form of labels. As theassembler program is processed, NASM allocates aninteger value to each label. The value is either theaddress of the operation-code prefixed by the instruc-tion or may have been explicitly set by an EQU direc-tive:
Fseek equ 23
Fread equ 24
We can load a register with the address refered to bya label by including the label as a constant operand:
mov esi, sourcebuf
Using the same syntax we can load a register with anequated constant:
Labels
mov cl, fread
Constant expressions
Suppose there exists a data-structures for which onehas a base address label, it is often convenient to beable to refer to fields within this structure in terms oftheir offset from the start of the structure. Considerthe example of a vector of 4 single precision float-ing point values at a location with label myvec. Theactual address at which myvec will be placed is de-termined by NASM, we do not know it. We may knowthat we want the address of the 3rd element of thevector:
mov esi, myvec + 3 *4
will place the address of this word into the esi register.
Constant expressions
NASM allows one to place arithmetic expressionswhose sub-expressions are constants wherever a con-stant can occur. The arithmetic operators are writtenC style as shown below.
operator means operator means| or + add^ xor - subtract& and * multiply<< shift left / signed division>> shift right // unsigned division% modulus %% unsigned modulus
Registers
You should be aware that in the Intel architecturea number of registers are aliased to the same statevectors, thus for example the eax, ax, al, ah reg-isters all share bits. More insidiously the floating pointregisters ST0..ST7 not only share state with the MMXregisters, but their mapping to these registers is dy-namic and variable.
register classes
byte word dword float nnx ssenumber reg reg reg reg reg reg
Aliased Aliased0 al ax eax st0 mm0 xmm0
1 cl bx ecx st1 mm1 xmm1
2 dl cx edx st2 mm2 xmm2
3 bl bx ebx st3 mm3 xmm3
4 ah sp esp st4 mm4 xmm4
5 ch bp ebp st5 mm5 xmm5
6 dh si esi st6 mm6 xmm6
7 bh di edi st7 mm7 xmm7
Memory Locations
Memory locations are syntactically represented bythe use of square brackets around an address ex-pression thus: [100], [myvec], [esi] all rep-resent memory locations.The address expressions, unlike constant expres-
sions, can contain components whose values are notknown until program execution. The final exampleabove refers to the memory location addressed bythe value in the esi register, and as such, dependson the history of prior computations affecting that reg-ister.
Addresses
Address expressions have to be encoded into ma-chine instructions, and since machine instructions,although of variable length on a CISC are nonethe-less finite, so too must the address expressions be.On Intel and AMD machines this constrains the com-plexity of address expressions to the following gram-mer:
memloc::= address | format address
format::= byte | word | dword | qword
address::= [ const ] | [ aexp ] | [ aexp + const ]
aexp::= reg | reg + iexp
iexp::= reg | reg * scale
scale::= 2|4|8
reg::= eax | ecx | ebx | edx | esp | ebp | esi | edi
const::= integer | label
Examples
byte[edx]
byte pointed to by edx
dword[edx+10]
4 byte word pointed to by edx+10
dword[edx+esi+34]
4 byte word at the address given by the sum of theesi and edx registers +34
word[eax+edi*4+200]
2 byte word at the address given by the eax register+ 4* edi register + 200The format qualifiers are used to disambiguate the
size of an operand in memory where the combina-tion of the operation code name and the other non-memory operands are insufficient so to do.
Sectioning
Programs running under Linux have their memorydivided into 4 sections:
text is the section of memory containing oper-ation codes to be executed. It is typicallymapped as read only by the paging sys-tem.
data is the section of memory containing ini-tialised global variables, which can be al-tered following the start of the program.
bss is the section containing uninitialsed globalvariables.
stack is the section in which dynamically allo-
Sectioning
cated local variables of subroutines arelocated.
The section directive is used by assembler programersto specify into which section of memory they wantsubsequent lines of code to be assembled. For ex-ample in the listing shown in algorithm 1 we dividethe program into three sections: a text section con-taining myfunc, a bss section containing 64 unde-fined bytes and a data section containing a vector of4 integers.
The label myfuncbase can be used with negativeoffsets to access locations within the bss, wilst thelabel myfuncglobal can be used with positive off-sets to access elements of the vector in the data sec-
Algorithm 1 Examples of the use of section and data reservation directivessection .text
global myfuncmyfunc:enter 128,0; body of function goes here
leaveret 0
section .bssalignb 16resb 64 ; reserve 64 bytes
myfuncBase:section .datamyfuncglobal: ; reserve 4 by 32-bit integers
dd 1dd 2dd 3dd 5
tion.
Data reservation
Data must be reserved in distinct ways in the differ-ent sections. In the data section, the data definitiondirectives db, dw, dd, and dq are used to definebytes, words, doublewords and quad words. The di-rective must be followed by a constant expression.When defining bytes or words the constant must bean integer. Doublewords and quadwords may be de-fined with floating point or integer constants as shownpreviously.In the bss section the directive resb is used to re-
serve a specified number of bytes, but no value isassociated with these bytes.
Stack data
Data can be allocated in the stack section by useof the enter operation code name. This takes theform:enter space, level
It should be used as the first operation code name ofa function. The level parameter is only of relevancein block structured languages and should be set to0 for assembler programming. The space parame-ter specifies the number of bytes to be reserved forthe private use of the function. Once the enter in-struction has executed, the data can be accessed atnegative offsets from the ebp register.
Releasing stack space dynamically
The last two instructions in a function should, asshown in algorithm be
leave
ret 0
The combined effect of these is to free the space re-served on the stack by enter, and pop the return ad-dress from the stack. The parameter to the operationcode name ret is used to specify how many bytesof function parameters should be discarded from thestack. If one is interfacing to C this should always beset to 0.
Label qualification
The default scope of a label is the assembler sourcefile containing the line it prefixes. But labels can beused to mark the start of functions that are to becalled from C or other high level languges. To indicatethat they have scope beyond the current asscemblerfile, the global directive should be used as shownin algorithm 1.The converse case, where an assembler file calls a
function exported by a C program is handled by theetern directive:extern printreal
call printreal
Label qualification
in the above example we assume that printrealis a C function called from assembler.
Linking and object file formats
There are 4 object file formats that are commonlyused on Linux and Windows systems as shown intable ??. This lists the name of the format, its fileextension - which is often ambiguous and the combi-nation of operating system and compiler that makesuse of it. A flag provided to Nasm specifies which for-mat it should use. We will only go into the use of thegcc compiler, since this is portable between Windowsand Linux.
Let us assume we have a C program called c2asm.cand an assembler file asmfromc.asm. Suppose wewish to combine these into a single executable mod-ule c2asm. We issue the following commands at theconsole:
nasm -felf -o asmfromc.o asmfromc.asm
gcc -oc2asm c2asm.c asmfromc.o
This assumes that we are working either under Linuxor under Cygwin. If we are using djgpp we type:
nasm -fcoff -o asmfromc.o asmfromc.asm
gcc -oc2asm c2asm.c asmfromc.o
Format Extension Operating System C++ Compilerwin32 .obj Windows Microsoft C++
obj .obj Windows Borland C++coff .o Windows Djgpp gcc.elf .o Windows Cygwin gcc.elf .o Linux gcc
Leading underbars
If working with djgpp all external labels in your pro-gram, whether imported with extern or imported withglobal must have a leading underbar character. Thusto call the C procedure printreal one would write:extern _printreal
call _printreal
whilst to export myfunc one would write
global _myfunc
_myfunc:enter 128,0
Useful x86 instructions
This is a very small subset of the available instruc-tions but should be enough for your purposes.
Data movement
mov mem, reg/litexample
mov [ebp+12],eax
mov dword[esp+4],12
meansstore the right operand in the memory location on
the left
mov reg, mem/reg/litexample
mov ebx,1
mov ecx,[esi+ebp]
mov eax,ebx
meansload the right operand into the register on the left
movss mem, regexample
movss [ebp+12],xmm0
meansstore the right operand in the memory location on
the left. The right operand is the bottom 32 bits of anxmm register.
movss reg, mem/regexample
movss xmm1,[esi+ebp]
movss xmm1,xxm2
meansload the right operand into the register on the left,
the left operand is the lower 32 bits of a xmm registerand the data should be a 32 bit float
movups
movups mem, regexample
movss [ebp+12],xmm0
meansstore the right operand in the memory location on
the left. The right operand is a 128 bit xmm register.
movss
movss reg, memexample
movups xmm1,[esi+ebp]
meansload the right operand into the register on the left,
the left operand is a 128 bit xmm register
push
push mem/reg/litexample
push dword 10
push dword[esi+ebp+40]
push ecx
meanspush the operand on stack, pre-decrementing the
esp register by 4
pop
pop mem/regexample
pop dword[esi+ebp+40]
pop ecx
meansthe operand is assigned the value on the top of stack
and the stack pointer is then incremented by 4
fld
fld memexample
fld dword[esi+ebp+40]
meansthe operand which is assumed to be a 32 bit floating
point value is pushed on the fpu stack
fild
fild memexample
fild dword[esi+ebp+40]
meansthe 32bit integer operand is pushed on the fpu stack
as a floating point number
fstp
fstp memexample
fstp dword[esi+ebp+40]
meansthe operand is assigned the 32bit floating point value
on the fpu stack the fpu stack is then popped
fistp
fistp memexample
fistp dword[esi+ebp+40]
meansthe 32bit floating point value on the fpu stack is con-
verted to an integer and stored in the operand, thefpu stack is then popped.
Arithmetic
Integer arithmetic instructions can be divided into 3classes
1. Add, subtract, and, or, xor.These are treated ab-solutely regularly as two operand instructions asshown below in section ??.
2. Multiply, this comes in both 2 and 3 operand forms.
3. Divide and Modulus, these are irregular and makeuse of specific registers
Regular integer arithmetic
These take the formoperation dest, srcand mean dest:= dest operation srcthe following operation codes are allowed
add, sub, and, or, xor
The table shows the allowed combinations of desti-nation and sourceOperand combinations for regular arithmetic
dest srcregister registerregister constantregister memorymemory registermemory constantExamples
add esp,5
sub eax, ebx
and [eax+12],ebp
add dword[esi+edi],1
add esi,[edi]
Multiply
imul reg,reg/memThis is functionally the same as the regular 2 operand
integer arithmetic instructions.Example
imul ebx, dword [ebp-26]
imul reg,reg,constThis three operand form is particularly useful for com-
puting array offsets.Example
imul esi,eax,16
Divide/modulus
A single instruction is used for both division and mod-ulus.idiv reg/memThe 64 bit value in edx:eax is divided by the operand,
the quotient is placed in eax, and the remainder isplaced in edx.Example
idiv [ebp+64]
Floating point arithmetic
The floating point stack can be used to perform arith-metic in a postfix manner. The following fpu opcodesoperate on the top two items on the fpu stack:
faddp st1, fsubp st1, stafdivp st1
fmulp st1
These perfrom an operation between the top of thefpu stack (st0) and st1, store the result in st1, thenpop the stack so that st1 becomes the new top ofstack. Bear in mind that the maximum depth of thefpu stack is 8. Operations use 80bit internal floatingpoint.
Vector arithmetic
It is possible to perform parallel operations on vec-tors of 32 bit floats using the xmm registers. Theseinstructions have the general formatoperationPS xmmreg,xmmregFor example
mulps xmm0,xmm5
the suffix PS stands for Packed Single precison floats.In this case the 4 floats in xmm0 are multiplied by thecorresponding floats in xmm5 and the result stored inxmm0.
The other useful vector arithmetic instructions in thiscontext are:
addps, subps, divps
These instructions also exist in a memory to registerform but for these to be used you have to guaran-tee that the operands are aligned on 16 byte mem-ory boundaries. Since this is complicated to ensure,I suggest that you restrict yourself to the register toregister forms of these instructions.
Scalar arithmetic
It is also possible to perform scalar arithmetic in thelow order 32 bit words of the xmm registers. For in-stance, you can do all of the vector operations by us-ing the subscript SS standing for Scalar Single preci-sion after the operation thus:
addss xmm2,xmm0
would add the bottom 32 bit float in xmm0 to the bot-tom float in xmm2 and leave the result in xmm2.
Conversion instructions
operation dest srccvtsi2ss xmm register general registercvtsi2ss xmm register memorycvtss2si general register xmm registercvtss2si memory xmm register
If you are going to use these scalar instructions it is worth taking note of the conversion instructions cvtsi2ss and cvtss2siwhich convert signed doubleword integers to single precision floats and vice versa.
Examples
cvtsi2ss xmm4, ebx
cvtsi2ss xmm3, [ebp+20]
cvtss2si eax, xmm0
Integer comparisons
Comparison instructions exist which will place theresults of comparison in the flags. The cmp instruc-tion compares two integers.Examples
cmp eax, 12
cmp eax, ecx
cmp ebx, [ebx+16]
Set
The result of the comparison is written to the flagsand can be used either by a SET instruction or by aconditional jump instruction. For instance to test if theeax register was less then 10 we could write
cmp eax,10
setl bl
At the end of this the bl register will contain a booleanvalue of 1 if eax had been less than 10 and 0 if it hadbeen greater than 10. The suffixes used by the SETinstruction indicate which comparison is being tested.The suffixes that are most likely to be of use to youare L, G and E standing for <, >, and =.
fcomip st0,st1
The instruction fcomip compares the top two ele-ments of the floating point stack, popping the top onefrom the stack and placing the result of the compari-son in the cpu flags.It may be necessary to discard the next item on the
fpu stack using an fincstp instruction which incre-ments the floating point stack pointer.
cmpss
There are a family of comparison operations thatwork between scalar xmm registers. These leave aninteger result in one of the registers thus:
cmpltss xmm2,xmm4
would compare the float in the bottom 32 bits of xmm2with the corresponding float in xmm4 and set xmm2to all 1s if xmm2 was less than xmm4, otherwise itwould set xmm2 to zero.
Scalar comparisons
instruction meanscmpltss xmma,xmmb xmma<xmmb
cmpeqss xmma,xmmb xmma=xmmbcmpnless xmma,xmmb xmma>xmmb
comiss
Comiss is an alternative technique for performingscalar comparisons it compares the contents of twoxmm registers and returns the results in the cpu flags.Example
comiss xmm1, xmm7
Branches
Branches can be unconditional and direct:
jmp lab
or uncoditional and indirect:
jmp dword[ebp+10]
or conditional on a condition code and direct:
jl lab1
jg lab3
je lab4
Calls
Calls can be direct:
call lab
or indirect:
call dword[ebp+10]
in either case the current value of the eip registeris pushed on the stack and the eip register loadedfrom the operand. Returns are perfomed using theret instruction which pops the top of stack into theeip register.