machine-level programming: x86-64

23
Machine-Level Programming: X86-64 Topics Topics Registers Stack Function Calls Local Storage X86-64.ppt CS 105 Tour of Black Holes of Computing

Upload: herman

Post on 06-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

CS 105 Tour of Black Holes of Computing. Machine-Level Programming: X86-64. Topics Registers Stack Function Calls Local Storage. X86-64.ppt. Interesting Features of x86-64. Pointers and long Integers are 64 bits Arithmetic ops support 8, 16, 32, and 64-bit ints - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Machine-Level Programming: X86-64

Machine-Level Programming:X86-64

Machine-Level Programming:X86-64

TopicsTopics Registers Stack

Function CallsLocal Storage

X86-64.ppt

CS 105Tour of Black Holes of Computing

Page 2: Machine-Level Programming: X86-64

– 2 – 105

Interesting Features of x86-64Interesting Features of x86-64Pointers and long Integers are 64 bitsPointers and long Integers are 64 bits

Arithmetic ops support 8, 16, 32, and 64-bit ints

Set of general purpose registers is 16Set of general purpose registers is 16

Full compatibility with X32Full compatibility with X32

Conditional ops are implemented using conditional move Conditional ops are implemented using conditional move instructions (when possible) – faster than branchinstructions (when possible) – faster than branch

Much of pgm state held in registersMuch of pgm state held in registers Up to 6 arguments passed via registers (not stack) Some procedures do not need to access the stack at all??

Floating-point ops implemented using register-oriented Floating-point ops implemented using register-oriented instructions rather than stack-basedinstructions rather than stack-based

With move to x86-64, gcc updated to take advantage of many With move to x86-64, gcc updated to take advantage of many architecture changes architecture changes

Page 3: Machine-Level Programming: X86-64

– 3 – 105

Data Representations: IA32 + x86-64Data Representations: IA32 + x86-64

Sizes of C Objects (in Bytes)Sizes of C Objects (in Bytes) C Data Type Typical 32-bit Intel IA32 x86-64

unsigned 4 4 4 int 4 4 4 long int 4 4 8char 1 1 1short 2 2 2 float 4 4 4double 8 8 8 long double 8 10/12 16char * 4 4 8

Or any other pointer

x86-64: 64 bit ints and 64 bit pointers

Page 4: Machine-Level Programming: X86-64

– 4 – 105

%rax

%rbx

%rcx

%rdx

%rsi

%rdi

%rsp

%rbp

x86-64 Integer Registers - 16x86-64 Integer Registers - 16

Extend existing registers. Add 8 new ones, word (16 bit and byte (8 bit) registers still available, first 2 bytes available (rax, rbx, rcx, rdx)

Make %ebp/%rbp general purpose, note 32 bit names

%eax

%ebx

%ecx

%edx

%esi

%edi

%esp

%ebp

%r8

%r9

%r10

%r11

%r12

%r13

%r14

%r15

%r8d

%r9d

%r10d

%r11d

%r12d

%r13d

%r14d

%r15d

Page 5: Machine-Level Programming: X86-64

– 5 – 105

InstructionsInstructions

Long wordLong word l l (4 Bytes) ↔ Quad word(4 Bytes) ↔ Quad word q q (8 Bytes)(8 Bytes)

New instructions:New instructions: movl → movq addl → addq sall → salq etc.

32-bit instructions that generate 32-bit results32-bit instructions that generate 32-bit results Set higher order bits of destination register to 0 Example: addl

Page 6: Machine-Level Programming: X86-64

– 6 – 105

Swap in 32-bit ModeSwap in 32-bit Mode

void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0;}

swap:pushl %ebpmovl %esp,%ebppushl %ebx

movl 12(%ebp),%ecxmovl 8(%ebp),%edxmovl (%ecx),%eaxmovl (%edx),%ebxmovl %eax,(%edx)movl %ebx,(%ecx)

movl -4(%ebp),%ebxmovl %ebp,%esppopl %ebpret

Body

Setup

Finish

Page 7: Machine-Level Programming: X86-64

– 7 – 105

Swap in 64-bit ModeSwap in 64-bit Mode

Operands passed in registers (Operands passed in registers (why useful?why useful?)) First (xp) in %rdi, second (yp) in %rsi 64-bit pointers

No stack operations required, No stack operations required, actually no stack frameactually no stack frame

32-bit data32-bit data Data held in registers %eax and %edx movl operation

void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0;}

swap:movl (%rdi), %edxmovl (%rsi), %eaxmovl %eax, (%rdi)movl %edx, (%rsi)retq

Page 8: Machine-Level Programming: X86-64

– 8 – 105

%rax

%rbx

%rcx

%rdx

%rsi

%rdi

%rsp

%rbp

x86-64 Integer Registers, 6 used for argumentsx86-64 Integer Registers, 6 used for arguments

%r8

%r9

%r10

%r11

%r12

%r13

%r14

%r15Callee saved Callee saved

Callee saved

Callee saved

C: Callee saved

Callee saved

Callee saved

Stack pointer

Used for linking

Return value

Argument #4

Argument #1

Argument #3

Argument #2

Argument #6

Argument #5

Page 9: Machine-Level Programming: X86-64

– 9 – 105

Interesting Features of Stack FrameInteresting Features of Stack Frame

Allocate entire frame at onceAllocate entire frame at once All stack accesses can be relative to %rsp Do so by decrementing stack pointer Can delay allocation, since safe to temporarily use red zone-

128 bytes available below the stack

Simple deallocationSimple deallocation Increment stack pointer No base/frame pointer needed

Page 10: Machine-Level Programming: X86-64

– 10 – 105

x86-64 Registersx86-64 RegistersArguments passed to functions via registersArguments passed to functions via registers

If more than 6 integral parameters, then pass rest on stack These registers can be used as caller-saved as well

All references to stack frame via stack pointerAll references to stack frame via stack pointer Eliminates need to update %ebp/%rbp

Other RegistersOther Registers R12-R15, Rbx, Rbp, callee saved 2 or 3 have special uses

Rax holds return value for a functionRsp is the stack pointer

Page 11: Machine-Level Programming: X86-64

– 11 – 105

Reading Condition Codes: x86-64Reading Condition Codes: x86-64

int gt (long x, long y){ return x > y;}

xorl %eax, %eax # eax = 0cmpq %rsi, %rdi # Compare x and ysetg %al # al = x > y

Body (same for both)

long lgt (long x, long y){ return x > y;}

SetX Instructions: Set single byte based on combination of condition codes

Does not alter remaining 3 bytes

Will disappearBlackboard?

Page 12: Machine-Level Programming: X86-64

– 12 – 105

Reading Condition Codes: x86-64Reading Condition Codes: x86-64

int gt (long x, long y){ return x > y;}

xorl %eax, %eax # eax = 0cmpq %rsi, %rdi # Compare x and ysetg %al # al = x > y

Body (same for both)

long lgt (long x, long y){ return x > y;}

Is %rax zero?Yes: 32-bit instructions set high order 32 bits to 0!

SetX Instructions: Set single byte based on combination of condition codes

Does not alter remaining 3 bytes

Page 13: Machine-Level Programming: X86-64

– 13 – 105

Conditionals: x86-64Conditionals: x86-64

absdiff: # x in %edi, y in %esimovl %edi, %eax # eax = xmovl %esi, %edx # edx = ysubl %esi, %eax # eax = x-ysubl %edi, %edx # edx = y-xcmpl %esi, %edi # x:ycmovle %edx, %eax # eax=edx if <=ret

int absdiff( int x, int y){ int result; if (x > y) { result = x-y; } else { result = y-x; } return result;}

Will disappearBlackboard?

Page 14: Machine-Level Programming: X86-64

– 14 – 105

Conditionals: x86-64Conditionals: x86-64

Conditional move instructionConditional move instruction cmovC src, dest Move value from src to dest if condition C holds More efficient than conditional branching (simple control flow) But overhead: both branches are evaluated

absdiff: # x in %edi, y in %esimovl %edi, %eax # eax = xmovl %esi, %edx # edx = ysubl %esi, %eax # eax = x-ysubl %edi, %edx # edx = y-xcmpl %esi, %edi # x:ycmovle %edx, %eax # eax=edx if <=ret

int absdiff( int x, int y){ int result; if (x > y) { result = x-y; } else { result = y-x; } return result;}

Page 15: Machine-Level Programming: X86-64

– 15 – 105

x86-64 Procedure Summaryx86-64 Procedure SummaryHeavy use of registersHeavy use of registers

Parameter passing More temporaries since more registers

Minimal use of stackMinimal use of stack Sometimes none Allocate/deallocate entire block

Many tricky optimizationsMany tricky optimizations What kind of stack frame to use Calling with jump Various allocation techniques

Page 16: Machine-Level Programming: X86-64

– 16 – 105

IA32 Example AddressesIA32 Example Addresses

$esp 0xffffbcd0p3 0x65586008p1 0x55585008p4 0x1904a110 p2 0x1904a008&p2 0x18049760beyond 0x08049744big_array 0x18049780huge_array 0x08049760main() 0x080483c6useless() 0x08049744final malloc() 0x006be166

address range ~232

FF

00

Stack

TextData

Heap

08

80

not drawn to scale

malloc() is dynamically linkedaddress determined at runtime

Page 17: Machine-Level Programming: X86-64

– 17 – 105

x86-64 Example Addressesx86-64 Example Addresses

$rsp 0x7ffffff8d1f8p3 0x2aaabaadd010p1 0x2aaaaaadc010p4 0x000011501120 p2 0x000011501010&p2 0x000010500a60beyond 0x000000500a44big_array 0x000010500a80huge_array 0x000000500a50main() 0x000000400510useless() 0x000000400500final malloc() 0x00386ae6a170

address range ~247

00007F

000000

Stack

TextData

Heap

000030

not drawn to scale

malloc() is dynamically linkedaddress determined at runtime

Page 18: Machine-Level Programming: X86-64

– 18 – 105

x86-64 Summaryx86-64 Summary

Pointers and long integers are 64 bitPointers and long integers are 64 bit

16 Registers, with 64, 32, 16, 8, bit uses16 Registers, with 64, 32, 16, 8, bit uses

Minimal use of stackMinimal use of stack Parameters passed in registers Allocate/deallocate entire block for a procedure Can use 128 bytes passed (beyond) stack pointer (red zone)

can use stack without messing with stack pointer

No Frame Pointer

Many tricky optimizationsMany tricky optimizations What kind of stack frame to use Calling with jump Various allocation techniques Compilers need to be smarter

Page 19: Machine-Level Programming: X86-64

– 19 – 105

x86-64 Locals in the Red Zonex86-64 Locals in the Red Zone

Avoiding Stack Pointer ChangeAvoiding Stack Pointer Change Can hold all information within

small window beyond stack pointer

/* Swap, using local array */void swap_a(long *xp, long *yp) { volatile long loc[2]; loc[0] = *xp; loc[1] = *yp; *xp = loc[1]; *yp = loc[0];

}

swap_a: movq (%rdi), %rax movq %rax, -24(%rsp) movq (%rsi), %rax movq %rax, -16(%rsp) movq -16(%rsp), %rax movq %rax, (%rdi) movq -24(%rsp), %rax movq %rax, (%rsi) ret

rtn Ptr

unused

%rsp

−8

loc[1]

loc[0]

−16

−24

Page 20: Machine-Level Programming: X86-64

– 20 – 105

x86-64 Stack Frame Examplex86-64 Stack Frame Example

Keeps values of Keeps values of aa and and ii in in callee save registerscallee save registers

Must set up stack frame to Must set up stack frame to save these registerssave these registers

long sum = 0;/* Swap a[i] & a[i+1] */void swap_ele_su (long a[], int i){ swap(&a[i], &a[i+1]); sum += a[i];}

swap_ele_su: movq %rbx, -16(%rsp) movslq %esi,%rbx movq %r12, -8(%rsp) movq %rdi, %r12 leaq (%rdi,%rbx,8), %rdi subq $16, %rsp leaq 8(%rdi), %rsi call swap movq (%r12,%rbx,8), %rax addq %rax, sum(%rip) movq (%rsp), %rbx movq 8(%rsp), %r12 addq $16, %rsp ret

Blackboard?

Page 21: Machine-Level Programming: X86-64

– 21 – 105

Understanding x86-64 Stack FrameUnderstanding x86-64 Stack Frameswap_ele_su: movq %rbx, -16(%rsp) # Save %rbx movslq %esi,%rbx # Extend & save i movq %r12, -8(%rsp) # Save %r12 movq %rdi, %r12 # Save a – array pointer leaq (%rdi,%rbx,8), %rdi # Create &a[i] subq $16, %rsp # Allocate stack frame leaq 8(%rdi), %rsi # &a[i+1] call swap # swap() movq (%r12,%rbx,8), %rax # a[i] addq %rax, sum(%rip) # sum += a[i] movq (%rsp), %rbx # Restore %rbx movq 8(%rsp), %r12 # Restore %r12 addq $16, %rsp # Deallocate stack frame ret

Page 22: Machine-Level Programming: X86-64

– 22 – 105

Understanding x86-64 Stack FrameUnderstanding x86-64 Stack Frameswap_ele_su: movq %rbx, -16(%rsp) # Save %rbx movslq %esi,%rbx # Extend & save i movq %r12, -8(%rsp) # Save %r12 movq %rdi, %r12 # Save a leaq (%rdi,%rbx,8), %rdi # &a[i] subq $16, %rsp # Allocate stack frame leaq 8(%rdi), %rsi # &a[i+1] call swap # swap() movq (%r12,%rbx,8), %rax # a[i] addq %rax, sum(%rip) # sum += a[i] movq (%rsp), %rbx # Restore %rbx movq 8(%rsp), %r12 # Restore %r12 addq $16, %rsp # Deallocate stack frame ret

rtn addr%r12

%rsp

−8

%rbx−16

rtn addr%r12

%rsp

+8

%rbx

Page 23: Machine-Level Programming: X86-64

– 23 – 105

Interesting Features of Stack FrameInteresting Features of Stack Frame

Allocate entire frame at onceAllocate entire frame at once All stack accesses can be relative to %rsp Do by decrementing stack pointer Can delay allocation, since safe to temporarily use red zone

Simple deallocationSimple deallocation Increment stack pointer No base/frame pointer needed