low level programming
DESCRIPTION
Low level Programming. Linux ABI. System Calls Everything distills into a system call /sys, / dev , / proc read() & write() syscalls What is a system call? Special purpose function call Elevates privilege Executes function in kernel But what is a function call?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/1.jpg)
Low level Programming
![Page 2: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/2.jpg)
Linux ABI
• System Calls– Everything distills into a system call• /sys, /dev, /proc read() & write() syscalls
• What is a system call?– Special purpose function call• Elevates privilege• Executes function in kernel
– But what is a function call?
![Page 3: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/3.jpg)
What is a function call?
• Special form of jmp– Execute a block of code at a given address– Special instruction: call <fn-address>– Why not just use jmp?
• What do function calls need?– int foo(int arg1, char * arg2);
• Location: foo()• Arguments: arg1, arg2, …• Return code: int
– Must be implemented at hardware level
![Page 4: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/4.jpg)
Hardware implementation
0000000000000107 <foo>: 107: 55 push %rbp 108: 48 89 e5 mov %rsp,%rbp 10b: 89 7d fc mov %edi,-0x4(%rbp) 10e: 48 89 75 f0 mov %rsi,-0x10(%rbp) 112: b8 00 00 00 00 mov $0x0,%eax 117: c9 leaveq 118: c3 retq
• Location• Address of function + ret instruction
• Arguments• Passed in registers (which ones? And why those?)
• Return code• Stored in register: EAX
• To understand this we need to know about assembly programming…
int foo(int arg1, char * arg2) { return 0; }
![Page 5: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/5.jpg)
Assembly basics
• What makes up assembly code?– Instructions• Architecture specific
– Operands• Registers• Memory (specified as an address)• Immediates
– Conventions• Rules of the road and/or behavior models
![Page 6: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/6.jpg)
Registers• General purpose
– 16bit: AX, BX, CX, DX, SI, DI – 32 bit: EAX, EBX, ECX, EDX, ESI, EDI – 64 bit: RAX, RBX, RCX, RDX, RSI, RDI + others
• Environmental– RSP, RIP– RBP = frame pointer, defines local scope
• Special uses– Calling conventions
• RAX == return code• RDI, RSI, RDX, RCX… == ordered arguments
– Hardware defined• Some instructions implicitly use specific registers
– RSI/RDI String instructions– RBP leaveq
![Page 7: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/7.jpg)
Memory• X86 provides complex memory addressing capabilities
– Immediate addressing• mov %rsi, ($0xfff000)
– Direct addressing• mov %rsi, (%rbp)
– Offset Addressing• mov %rsi, $0x8(%rax)
• Base + (Index * Scale) + Displacement– A.K.A. SIB– Occasionally seen– Hardly ever used by hand– movl %ebp, (%rdi,%rsi,4)
• Address = rdi + rsi * 4– A more complicated example
• segment:disp(base, index, scale)
![Page 8: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/8.jpg)
8/16/32/64 bit operands
• Programmer explicitly specifies operand length in operand
• Example: mov reg, reg– 8 bits: movb %al, %bl– 16 bits: movw %ax, %bx– 32 bits: movl %eax, %ebx– 64 bits: movq %rax, %rbx
• What about “movl %ebx, (%rdi)”?
![Page 9: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/9.jpg)
Function call implementation
0000000000000107 <foo>: 107: 55 push %rbp 108: 48 89 e5 mov %rsp,%rbp 10b: 89 7d fc mov %edi,-0x4(%rbp) 10e: 48 89 75 f0 mov %rsi,-0x10(%rbp) 112: b8 00 00 00 00 mov $0x0,%eax 117: c9 leaveq 118: c3 retq
• Location• Address of function + ret instruction
• Arguments• Passed in registers (which ones? And why those?)
• Return code• Stored in register: EAX
We can now decode what is going on here
int foo(int arg1, char * arg2) { return 0; }
![Page 10: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/10.jpg)
OS development requires assembly programming
• OS operations are not typically expressible with a higher level language– Examples: atomic operations, page table
management, configuring segments, • System calls(!)
• How to mix assembly with OS code (in C)– Compile with assembler and link with C code• .S files compiled with gas
– Inline w/ compiler support• .c files compiled with gcc
![Page 11: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/11.jpg)
Implementing assembler functions
• C functions:– Location, args, return code
• ASM functions:– Location only– Programmer must implement everything else• Arguments, context, return values• Everything in foo() from before + function body• Programmer takes place of compiler
– Must match calling conventions
![Page 12: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/12.jpg)
Calling assembler functions• Programmer implements calling convention– Behaves just like a regular function
• Only need location– Linker takes care of the rest
.globl foofoo:
push %rbpmov %rsp, %rbp…
Defines a global variable
extern int foo(int, char *);
int main() {int x = foo(1, “test”);
}
foo.Smain.c
![Page 13: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/13.jpg)
Inline
• OS only needs a few full blown assembly functions– Context switches, interrupt handling, a few others
• Most of the time just need to execute a single instruction– i.e. set a bit in this control register
• GCC provides ability to incorporate inline assembly instructions into a regular .c file– Not a function– Compiler handles argument marshaling
![Page 14: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/14.jpg)
Overview
• Inline assembly includes 2 components– Assembly code– Compiler directives for operand marshaling
asm ( assembler template : output operands /* optional */ : input operands /* optional */ : list of clobbered registers /* optional */ );
![Page 15: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/15.jpg)
Inline assembly execution
• Sequence of individual assembly instructions– Can execute any hardware instruction– Can reference any register or memory location– Can reference specified variables in C code
• 3 Stages of execution1. Load C variables into correct registers or memory2. Execute assembly instructions3. Copy register and memory contents into C variables
![Page 16: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/16.jpg)
Specifying inline operands
• How does compiler copy C variables to/from registers?
• C variables and registers are explicitly linked in asm specification– Sections for input and output operands– Compiler handles copying to and from variables
before and after assembly executed– Assembly code references marshaled values (index
of operand) instead of raw registers
![Page 17: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/17.jpg)
Operand Codes
• Wide range of operand codes (“constraints”) are available– Input: “code”(c-variable)– Output: “=code”(c-variable)
a = %rax, %eax, %axb = %rbx, %ebx, %bxc = %rcx, %ecx, %cxd = %rdx, %edx, %dxS = %rsi, %esi, %siD = %rdi, %edi, %di
r = Any registerq = a, b, c, d regsm = memory operandf = floating point regi = immediateg = anything
Explicit Register codes Other Operand codes
And many more….
![Page 18: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/18.jpg)
Register exampleint foo(int arg1, char * arg2) { int a=10, b;
asm ("movl %1, %%ecx;\n“ “movl %%ecx, %0;\n" : ”=b"(b) /* output */ : “a"(a) /* input */
: );
return 0;}
What does this do?
0000000000000107 <foo>: 107: 55 push %rbp 108: 48 89 e5 mov %rsp,%rbp 10b: 53 push %rbx 10c: 89 7d e4 mov %edi,-0x1c(%rbp) 10f: 48 89 75 d8 mov %rsi,-0x28(%rbp) 113: c7 45 f0 0a 00 00 00 movl $0xa,-0x10(%rbp) 11a: 8b 45 f0 mov -0x10(%rbp),%eax 11d: 89 c1 mov %eax,%ecx 11f: 89 cb mov %ecx,%ebx 121: 89 d8 mov %ebx,%eax 123: 89 45 f4 mov %eax,-0xc(%rbp) 126: b8 00 00 00 00 mov $0x0,%eax 12b: 5b pop %rbx 12c: c9 leaveq 12d: c3 retq
![Page 19: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/19.jpg)
Memory example
• X86 can also use memory (SIB, etc) operands– “m” operand code
0000000000000107 <foo>:0: 55 push %rbp1: 48 89 e5 mov %rsp,%rbp4: 89 7d ec mov %edi,-0x14(%rbp)7: 48 89 75 e0 mov %rsi,-0x20(%rbp)b: c7 45 fc 0a 00 00 00 movl $0xa,-0x4(%rbp)12: 8b 4d fc mov -0x4(%rbp),%ecx15: 89 4d f8 mov %ecx,-0x8(%rbp)18: b8 00 00 00 00 mov $0x0,%eax1d: c9 leaveq 1e: c3 retq
int foo(int arg1, char * arg2) { int a=10, b;
asm ("movl %1, %%ecx;\n" "movl %%ecx, %0;\n" : "=m"(b) : "m"(a)
: );
return 0;}
![Page 20: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/20.jpg)
Input/output operands
• Sometimes input and output operands are the same variable– Transform input variable in some way
0000000000000107 <foo>:0: 55 push %rbp1: 48 89 e5 mov %rsp,%rbp4: 89 7d ec mov %edi,-0x14(%rbp)7: 48 89 75 e0 mov %rsi,-0x20(%rbp)b: c7 45 fc 0a 00 00 00 movl $0xa,-0x8(%rbp)12: c7 45 fc 05 00 00 00 movl $0x5,-0x4(%rbp)19: 8b 45 fc mov -0x4(%rbp),%eax1c: 03 45 f8 add -0x8(%rbp),%eax1f: 89 45 fc mov %eax,-0x4(%rbp)22: b8 00 00 00 00 mov $0x0,%eax27: c9 leaveq 28: c3 retq
int foo(int arg1, char * arg2) { int a=10, b=5;
asm (“addl %1, %0;\n" : "=r"(b) : "m"(a), "0"(b) : );
return 0;}
![Page 21: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/21.jpg)
Input/output operands (2)
• Input/output operands can also be specified with “+”
0000000000000107 <foo>:0: 55 push %rbp1: 48 89 e5 mov %rsp,%rbp4: 89 7d ec mov %edi,-0x14(%rbp)7: 48 89 75 e0 mov %rsi,-0x20(%rbp)b: c7 45 fc 0a 00 00 00 movl $0xa,-0x8(%rbp)12: c7 45 fc 05 00 00 00 movl $0x5,-0x4(%rbp)19: 8b 45 fc mov -0x4(%rbp),%eax1c: 03 45 f8 add -0x8(%rbp),%eax1f: 89 45 fc mov %eax,-0x4(%rbp)22: b8 00 00 00 00 mov $0x0,%eax27: c9 leaveq 28: c3 retq
int foo(int arg1, char * arg2) { int a=10, b=5;
asm (“addl %1, %0;\n" : “+r"(b) : "m"(a) : );
return 0;}
![Page 22: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/22.jpg)
Clobbered list
• We cheated earlier…
• How does compiler know to save/restore ECX?– It doesn’t
int foo(int arg1, char * arg2) { int a=10, b;
asm ("movl %1, %%ecx;\n" "movl %%ecx, %0;\n" : "=m"(b) : "m"(a)
: );
return 0;}
• We must explicitly tell compiler what registers have been implicitly messed with– In this case ECX, but other instructions have implicit operands (CHECK THE
MANUALS)• Second set of constraints to inline assembly
– Clobber list: Operands not used as either input or output but still must be saved/restored by compiler
![Page 23: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/23.jpg)
Why clobber list?
• Why do we need this?– Compilers try to optimize performance
• Cache intermediate values and assume values don’t change• Compiler cannot inspect ASM behavior
– outside scope of compiler
• Clobber lists tell compiler:– “You cannot trust the contents of these resources
after this point”– Or “Do not perform optimizations that span this block
on these resources”
![Page 24: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/24.jpg)
Using clobber lists
• ECX is used implicitly so its value must be saved/restored
• What about “memory”?
int foo(int arg1, char * arg2) { int a=10, b;
asm ("movl %1, %%ecx;\n" "movl %%ecx, %0;\n"
: "=m"(b): "m"(a)
: “ecx”, “memory” );
return 0;}
![Page 25: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/25.jpg)
Back to system calls
• Function calls not that special– Just an abstraction built on top of hardware
• System calls are basically function calls– With a few minor changes• Privilege elevation• Constrained entry points
– Functions can call to any address– System calls must go through “gates”
![Page 26: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/26.jpg)
Implementing system calls
• System calls are implemented as a single function call: syscall()– read() and write() actually just invoke syscall()
• What does syscall do?– Enters into the kernel at a known location– Elevates privilege– Instantiates kernel level environment
• Once inside the kernel, an appropriate system call handler is invoked based on arguments to syscall()
![Page 27: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/27.jpg)
x86 and Linux• Number of different mechanisms for implementing syscall
– Legacy: int 0x80 – Invokes a single interrupt handler– 32 bit: SYSENTER – Special instruction that sets up preset kernel
environment– 64 bit: SYSCALL – 64 bit version of SYSENTER
• All jump to a preconfigured execution environment inside kernel space– Either interrupt context or OS defined context
• What about arguments?– syscall(int syscall_num, args…)
![Page 28: Low level Programming](https://reader035.vdocument.in/reader035/viewer/2022062501/568166ec550346895ddb3792/html5/thumbnails/28.jpg)
Specific system calls
• Each system call has a number assigned to it– Index into a system call table • Function pointers referencing each syscall handler
• Syscall(int syscall_num, args…)– Sets up kernel environment– Invokes syscall_table[syscall_num](args…);– Returns to user space: • Resets environment to state before call