information management cs165/csci e-268cs161/notes/security-part-one.pdf · making new processes:...
TRANSCRIPT
What computer security is really about
Understanding humanity
Understanding
how systems work
Hardware
OS
App App App
Theoretical cryptography
Applied cryptography
Race Conditions:
Non-atomic System Call Pairs//Process Xsys_call0();sys_call1();
//Process Ydo_evil()
Time Time
Time
Linux: Mapping Humans to Privileges• Each user has a user ID (UID), with root having UID 0
• Each user also has a group ID (GID), but we’ll mostly ignore
groups today
• Each file has:
• Read/write/execute permissions for the file’s owner, the file’s
group, and the world (i.e., all other users)
• Set-user-id bit: 1 if the file should be executed with the owner’s
permissions (shows up as “s” instead of “x” in “ls –l” output); 0 if
the file should be executed with launching user’s permissions$ ls -l a.out-rwxr-xr-x 1 mickens somegrp 524288 Jan 19 11:35
User
Group
World
Owner
Owner
group
Linux: Mapping Humans to Privileges• Each process has a bunch of IDs, including:
• Real UID: The UID of the process owner
• Effective UID: The UID that the kernel checks when validating
access permissions
• On fork(), the child inherits the UIDs of its parent
• On exec(progName), the process keeps its UIDs unless the
progName file has the set-user-ID bit set, in which case the
effective UID of the process is set to the UID of the binary’s owner
• Ex: The passwd command needs to update a user’s entry in
/etc/shadow file
• The /etc/shadow file is sensitive and should only be modified
by root, but regular users need to be able to update their
passwords!
• So, the passwd binary is owned by root, but has a set-user-ID
bit of 1
Making New Processes: fork()• fork() allows a parent process to create a child process
• Abstractly speaking, child gets a copy of parent’s address space
• Linux’s fork() uses copy-on-write pages: avoids synchronous
copy of the entire address space, and only incurs copy overhead
for pages which actually diverge between parent and child
Parentvirt addrspace
Childvirt addrspace
Physicalmemory
XYZ
XYZ
Parentvirt addrspace
Childvirt addrspace
Physicalmemory
XYZ
X*YZ
Child writes X
• BSD’s old fork() did full, synchronous copy
The Drunk Uncle Named BSD vfork()• vfork() was intended for the situation in which
the parent does a fork() and the child does an
“immediate” exec()
• After vfork(), parent is suspended; child
executes using parent’s address space and
thread-of-control until exec() is called
• When child calls exec(), BSD makes a new
address space for the child and copies file
descriptors, current working directory, etc. like
a regular fork()
• Child process is supposed to not modify
anything in the parent’s address space before
calling exec() or otherwise undefined demons
happen
• BSD BE SERIOUS UGGHHH IMPLEMENT COW
The Drunk Uncle Named BSD vfork()Problem: vfork()+exec() is not atomic!
Solution for priority inversion: Use
fork()+exec(), and implement COW to
make fork() fast.[fork()+exec() still isn’t atomic, but the
denial of service is now fixed since the
parent isn’t blocked waiting for the
child.]
Fun With Symlinks• Symbolic link: special file whose contents are name of another file
• Ex: link –s /etc/shadow /tmp/foo
• /tmp/foo is now an alias for /etc/shadow
• When a process P wants to read/write/execute /tmp/foo, the
kernel checks whether P has the appropriate access permissions for the underlying path /etc/shadow
• A program can check whether path refers to symlink or regular file
• Sounds pretty secure, right?
SO CLOSEWithout careful coding, a
program’s security checks
on a pathname are not
atomic with respect to the
dereferencing of pathname!
//Imagine that a mailserver runs with//root privileges.char *filename = "/home/loki/mailbox";char *new_msg = get_new_msg_for(“loki");if(new_msg == NULL){
return;}int new_msg_len = strlen(new_msg);
//Write the new message into Loki's//mailbox, but we're paranoid! Only//write the new message if Loki's//mailbox path isn't a symlink.struct stat lstat_info;int fd;lstat(filename, &lstat_info);if(!S_ISLNK(lstat_info.st_mode)){
fd = open(filename, O_RDWR);write(fd, new_msg, new_msg_len);close(fd);
}
//Window of vulnerability
$ rm /home/loki/mailbox$ ln –s /etc/shadow
/home/loki/mailbox
The password file will get overwritten with the contents of Loki’s new email. By the way, LOKI CAN EMAIL HIMSELF.
char *filename = "/home/loki/mailbox";char *new_msg = get_new_msg_for(“loki");if(new_msg == NULL){return;}int new_msg_len = strlen(new_msg);
struct stat lstat_info;struct stat fstat_info;int fd;
lstat(filename, &lstat_info); //Doesn’t follow//symlinks
fd = open(filename, O_RDWR); //*Does* follow//symlinks--“dereferences” symlink to get//fd which represents the symlink target
fstat(fd, &fstat_info);if (lstat_info.st_mode == fstat_info.st_mode &&
lstat_info.st_ino == fstat_info.st_ino &&lstat_info.st_dev == fstat_info.st_dev){write(fd, new_msg, new_msg_len);
}close(fd);
Ensures that the
filename used to
open the file
wasn’t a symlink!//Attacker window to corrupt the fd that will//be opened by the mailserver.
Other DefensesLinux added several new system
calls to reduce the number of race
conditions developers must check
//Ten directory race conditions//to check.int fd0 = open(“dirName/0.txt”);int fd1 = open(“dirName/1.txt”);
. . .int fd9 = open(“dirName/9.txt”);
//One directory race condition//to check.int dirFd = open(“dirName”);int fd0 = openat(dirFd, “0.txt”);int fd1 = openat(dirFd, “1.txt”);
. . .int fd9 = openat(dirFd, “9.txt”);
Once process has an fd for
a path, symlink shenanigans
won’t change the inode that
the file descriptor points to!
Other Defenses• There have been various research proposals to add
transactional support to file systems
• Regular journaling file systems provide all-or-nothing
transactions for individual file system calls
• We could add support for transactions which span
multiple file system calls (e.g., see TxOS)sys_xbegin();struct stat st;int fd;lstat(filename, &st_info);if(!S_ISLNK(st_info.st_mode)){
//Race window would ordinarily//be here, but transactions//are isolated from each other!fd = open(filename, O_RDWR);write(fd, buf, buf_len);close(fd);
}sys_xend();
//Process Ydo_evil()
Time Time
Time
Race Conditions: Non-atomic System Call Pairs
//Process Xsys_call0();sys_call1();
//Process Xsys_xbegin();sys_call0();sys_call1();sys_xend();
Txn Txn
Impossible!
L1 d-cacheL1 i-cache
L2 cache
L3 cache
L1 d-cacheL1 i-cache
L2 cache
RAM
mov 1 --> [x]mov [y] --> %eax
mov 1 --> [y]mov [x] --> %ebx
//Assume that memory //locations [x] and [y] //both initialized to 0
x86 allows both threads
to read 0 (which would
be impossible if both
threads run instructions
sequentially (but may be
interrupted)).
How Hardware Shows Its Love For Us• Modern processors do unholy things to improve performance
• Per-core store buffers avoid the need for a core to stall on
a synchronous write through the memory hierarchy
• A core may also reorder instructions (e.g., non-dependent
loads and stores) to increase parallelism of hardware usage
//Load is dependent on the//store, so hardware will //*not* reorder the memory//operations.x = 42;y = x;
How Hardware Shows Its Love For Us• Modern processors do unholy things to improve performance
• Per-core store buffers avoid the need for a core to stall on
a synchronous write through the memory hierarchy
• A core may also reorder instructions (e.g., non-dependent
loads and stores) to increase parallelism of hardware usage
//First corewhile(shouldStop == 0){;}
//Second coreanswer = 42;shouldStop = 1;printf("%d", answer);
No data dependency w.r.t.
the local core. So, hardware
may reorder the stores!
This code may not print 42!
It’s hard for developers to
find all of the places in
which memory-ordering
primitives are necessary!
Linux TLB Flushing Logic on x86• When the kernel changes a page table mapping, the kernel
must flush the local core’s TLB to remove any stale entry
• On a multi-core system, the local core must also send an IPI
to other cores so that those cores can flush their TLBs too
Race Condition in Linux’s ELF Loader
• Just like OS161, Linux has code to load an ELF binary into
memory, parse it, and initialize various in-memory regions
for the stack, heap, and code
• Each process structure has a struct mm_structrepresenting the process’ address space
• Kernel must grab mm_struct::mmap_sem before
modifying the process’ regions (there’s one structvm_area_struct for each region)
struct mm_struct{struct vm_area_struct *mmap; /* List of VMAs */pgd_t *pgd; /* Page table directory pointer,
* i.e., the thing that gets* assigned to %cr3 */
struct rw_semaphore mmap_sem; /* Protects VMAs *///...etc...
};
//In Linux 2.4.28, the mmap_sem lock was released too early!static int load_elf_library(struct file *file){
down_write(¤t->mm->mmap_sem);//Update the code VMA for the ELF library.error = do_mmap(file,
ELF_PAGESTART(elf_phdata->p_vaddr),(elf_phdata->p_filesz +
ELF_PAGEOFFSET(elf_phdata->p_vaddr)),PROT_READ | PROT_WRITE | PROT_EXEC,MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,(elf_phdata->p_offset -
ELF_PAGEOFFSET(elf_phdata->p_vaddr)));up_write(¤t->mm->mmap_sem);if(error != ELF_PAGESTART(elf_phdata->p_vaddr))
goto out_free_ph;
elf_bss = elf_phdata->p_vaddr + elf_phdata->p_filesz;padzero(elf_bss);
len = ELF_PAGESTART(elf_phdata->p_filesz + elf_phdata->p_vaddr +ELF_MIN_ALIGN - 1);
bss = elf_phdata->p_memsz + elf_phdata->p_vaddr;if(bss > len) //Create the data VMA for the ELF library.
do_brk(len, bss - len); //do_brk() assumes mmap_sem is held!
unsigned long do_brk(unsigned long addr, unsigned long len){//Allocate a new VMA to represent the new//extension to the process’ address space.struct vm_area_struct *vma = kmem_cache_alloc(vm_area_cachep,
SLAB_KERNEL);if(!vma)
return -ENOMEM;
//Update VMA bookkeeping.vma->vm_mm = mm;vma->vm_start = addr;vma->vm_end = addr + len;vma->vm_flags = flags;vma->vm_page_prot = protection_map[flags & 0x0f];//Other bookkeeping values updated . . .
//Add the new VMA to the process’ set of VMAs//(the VMAs are stored in a red-black tree).vma_link(mm, vma, prev, rb_link, rb_parent);
return addr;}
The kernel may
sleep during
memory alloc!
unsigned long do_brk(unsigned long addr, unsigned long len){//Allocate a new VMA to represent the new//extension to the process’ address space.struct vm_area_struct *vma = kmem_cache_alloc(vm_area_cachep,
SLAB_KERNEL);if(!vma)
return -ENOMEM;
//Update VMA bookkeeping.vma->vm_mm = mm;vma->vm_start = addr;vma->vm_end = addr + len;vma->vm_flags = flags;vma->vm_page_prot = protection_map[flags & 0x0f];//Other bookkeeping values updated . . .
//Add the new VMA to the process’ set of VMAs//(the VMAs are stored in a red-black tree).vma_link(mm, vma, prev, rb_link, rb_parent);
return addr;}
Attacker launches a program with several
threads that allocate a lot of memory and
call sys_uselib() with a carefully-crafted ELF
binary. Goal: Trigger race condition and
corrupt memory in attacker-controlled way!
• rb_link and rb_parent are pointers which
vma_link() uses to write kernel memory
• By controlling how the race condition
corrupts memory, the attacker can
overwrite pointers with attacker-controlled
values
• Later, when kernel writes memory through
corrupted pointers, the kernel will write to
memory addresses of the attacker’s
choosing
• If the attacker supplies the data for the
write, the attacker also chooses the new
contents of those memory addresses
• Ex: Overwriting a kernel function pointer to
trick the kernel into invoking the wrong
function
vma_link(mm, vma, prev, rb_link, rb_parent);
Kernel struct
[funcPtr]
[otherStuff ]
Kernel function
foo()
Kernel function
bar()
Corrupted ptr
which kernel
writes to
Virtual mem
• rb_link and rb_parent are pointers which
vma_link() uses to write kernel memory
• By controlling how the race condition
corrupts memory, the attacker can
overwrite pointers with attacker-controlled
values
• Later, when kernel writes memory through
corrupted pointers, the kernel will write to
memory addresses of the attacker’s
choosing
• If the attacker supplies the data for the
write, the attacker also chooses the new
contents of those memory addresses
• Ex: Overwriting a kernel function pointer to
trick the kernel into invoking the wrong
function
vma_link(mm, vma, prev, rb_link, rb_parent);
Kernel struct
[funcPtr]
[otherStuff ]
Kernel function
foo()
Kernel function
bar()
Corrupted ptr
which kernel
writes to
Virtual mem
HOW DO WE
PREVENT THIS?• In the case of this specific ELF
vulnerability, the fix is to hold
mmap_sem while the data region for
the library is created//Also, in do_brk() . . ./** mm->mmap_sem is required to * protect against another thread* changing the mappings while we * sleep (on kmalloc for one).*/verify_mmap_write_lock_held(mm);
HOW DO WE
PREVENT THIS?• In the case of this specific ELF
vulnerability, the fix is to hold
mmap_sem while the data
region for the library is created
• However, memory corruption
vulnerabilities are often tricky to
prevent; they represent a large
fraction of all security holes!