Using System-Specific Compiler Extensions to Find Errors in Systems Code
Dawson EnglerBen Chelf, Andy Chou, Seth Hallem
Stanford University
Talk about how to find hundreds of errors by just putting a little bit of system specific knowledge in a compiler
Talk about how to find hundreds of errors by just putting a little bit of system specific knowledge in a compiler
Checking systems software Systems software has many ad-hoc restrictions:
– “acquire lock L before accessing shared variable X”– “do not allocate large variables on 6K kernel stack”
Error = crashed system. How to find errors?– Formal verification
» + rigorous» - costly + expensive. *Very* rare to do for software
– Testing:» + simple, few false positives» - requires running code: doesn’t scale & can be
impractical – Manual inspection
» + flexible» - erratic & doesn’t scale well.
– What to do??
Never do X (do not use floating point, allocate large vars on 6K kernel stack) Always do X before/after Y (acquire lock before use, release after use)
Never do X (do not use floating point, allocate large vars on 6K kernel stack) Always do X before/after Y (acquire lock before use, release after use)
Practically intractable: far too strenous, and even if you do, spec isn’t code.Practically intractable: far too strenous, and even if you do, spec isn’t code.
Method of choice if you build systems for money: test. Problem: O(paths) = exponential in length of code. So if you build systems, what you wind up with is a system that only crashes after a week. Further, mapping crash back to cause can be *really* hard.
Method of choice if you build systems for money: test. Problem: O(paths) = exponential in length of code. So if you build systems, what you wind up with is a system that only crashes after a week. Further, mapping crash back to cause can be *really* hard.
Inspection is dead: modify code, have to do it again.
Inspection is dead: modify code, have to do it again.
Another approach Observation: rules can be checked with a compiler
– scan source for “relevant” acts check if they make sense E.g., to check “disabled interrupts must be re-enabled:” scan for calls to disable()/enable(), check that they match, not done twice
Main problem:– compiler has machinery to automatically check, but
not knowledge– implementor has knowledge but not machinery
Meta-level compilation (MC): – give implementors a framework to add easily-
written, system-specific compiler extensions
Never do X (do not use floating point, allocate large vars on 6K kernel stack) Always do X before/after Y (acquire lock before use, release after use)
Never do X (do not use floating point, allocate large vars on 6K kernel stack) Always do X before/after Y (acquire lock before use, release after use)
Want to make compilers aggressively system specific: if you design a system or interface and see a use of it, you invariably see ways of doing it better: give you way to articulate this knowledge and have compiler do it for you automatically
Want to make compilers aggressively system specific: if you design a system or interface and see a use of it, you invariably see ways of doing it better: give you way to articulate this knowledge and have compiler do it for you automatically
Meta-level compilation (MC) Implementation:
– Extensions dynamically linked into GNU g++ compiler
– Applied down all paths in input program source– E.g. 64-line extension to check disable/disable (82
bugs)
Static detection of real errors in real systems: – 600+ bugs in Linux, OpenBSD, FLASH, Xok exokernel– most extensions < 100 lines, written by system
outsiders
save(flags);cli();if(!(buf = alloc())) return NULL;restore(flags);return buf;
Linux: raid5.c
GNU C++ compiler
interrupt chk“did not re-enable interrupts!”
Actual error in Linux: raid5 driver disables interrupts, and then if it fails to allocate buffer, returns with them disabled. This kernel deadlock is actually hidden by an immediate segmentation fault since the callers dereference the pointer without checking for NULL
Actual error in Linux: raid5 driver disables interrupts, and then if it fails to allocate buffer, returns with them disabled. This kernel deadlock is actually hidden by an immediate segmentation fault since the callers dereference the pointer without checking for NULL
The main results: it works really well, and its easyThe main results: it works really well, and its easy
A bit more detail
{ #include ”linux-includes.h” }sm chk_interrupts { decl { unsigned } flags; // named patterns pat enable = { sti(); } | { restore_flags(flags); }; pat disable = { cli(); };
// states is_enabled: disable ==> is_disabled | enable ==> { err("double enable"); } ; is_disabled: enable ==> is_enabled | disable ==> { err("double disable"); } | $end_of_path$ ==> { err("exiting w/intr disabled!"); } ; }
Isenabled
Is disabled
disable
error
initial
enable
disable
enable
End-of- path
“X before Y” rule: system call pointers Applications are evil
– OS much check all input pointers before use– one missing check = security hole
MC checker: – Bind syscall ptr’s to “tainted” state– tainted vars only touched w/
“safe” routines– or: explicit check to make “clean”
/* from sys/kern/disk.c */int sys_disk_request(… struct buf *reqbp, u_int k) { ... /* bypass for direct scsi commands */ if (reqbp->b_flags & B_SCSICMD) return sys_disk_scsicmd (sn, k, reqbp);
P.tainted
check(p)
error
Each input ptr P
use(p)
P.clean
copyin(p),copyout(p)
use(p)
Deriving specification from common usage Problem: difficult to specify all user pointers
– so:see what code usually does, deviations probably errors
– if ever pass ptr to paranoid routine, make sure always do
Found 5 security errors in Linux. – Canonical example: hole in an “ioctl” routine for
some obscure device driver./* drivers/usb/evdev.c */static int evdev_ioctl(..., unsigned long arg) { ... switch (cmd) { case EVIOCGVERSION: return put_user(EV_VERSION, (__u32 *) arg); case EVIOCGID: /* copy_to_user(to, from)! */ return copy_to_user(&dev->id, (void *) arg, sizeof(struct input_id));
Drivers pull them from all sorts of random locationsDrivers pull them from all sorts of random locations
Must check that alloc succeeded Must allocate enough space Must not use after free() Must free alloc’d object on error:
Kernel alloc/dealloc rules
/* from drivers/char/tea6300.c */ client = kmalloc(sizeof *client,GFP_KERNEL); if (!client) return -ENOMEM; ... tea = kmalloc (sizeof *tea, GFP_KERNEL); if (!tea) return -ENOMEM; ... MOD_INC_USE_COUNT; /* bonus bug: kmalloc could sleep */
47 error leaks, 48 false positives. 72 mod_inc/mod_dec errors 47 error leaks, 48 false positives. 72 mod_inc/mod_dec errors
Stripped-down kernel malloc/free checker decl { scalar } sz; // match any scalar decl { const int } retv; // match const ints state decl { any_ptr } v; // match any pointer, can bind to a state
// Bind malloc results to “unknown” until observed start: { v = (any)malloc(sz) } ==> v.unknown | { free(v) } ==> v.freed; // can compare in states unknown, null, not_null v.unknown, v.null, v.not_null: { (v == 0) } ==> true = v.null, false = v.not_null | { (v != 0) } ==> true = v.not_null, false = v.null; // Cannot reach error path with unknown or not-null v.unknown, v.not_null: { return retv; } ==> { if(mgk_int_cst(retv) < 0) err("Error path leak!"); }; // No dereferences of null, freed, or unknown ptrs. v.null, v.freed v.unknown: { *(any *)v } ==> { err("Using ptr illegally!"); };
Interesting since it’s so small: a few lines of code and we have a path sensitive high-level analysis pass
Interesting since it’s so small: a few lines of code and we have a path sensitive high-level analysis pass
Some amusing bugs No check (130 errors, 11 false pos). Worse case
(many uses):
use after free (14 errors, 3 false pos): 5 cut&paste of
wrong size (2 errors)
/* drivers/isdn/pcbit:pcbit_init_dev */ kfree(dev); iounmap((unsigned char*)dev->sh_mem); release_mem_region(dev->ph_mem, 4096);
/* include/linux/coda_linux.h:CODA_ALLOC */ ptr = (cast)vmalloc((unsigned long) size); ... if (ptr == 0) printk("kernel malloc returns 0\n”);memset( ptr, 0, size );
/* drivers/parport/daisy.c:add_dev:50 */ newdev = kmalloc (GFP_KERNEL,sizeof(struct daisydev));
“In context Y, don’t do X”: blocking Linux: if interrupts are disabled, or spin lock held,
do not call an operation that could block. MC checker:
– Compute transitive closure of allpotentially blocking fn’s
– Hit disable/lock: warn of any calls– 123 errors, 8 false pos
/* drivers/net/pcmcia/wavelan_cs.c */spin_lock_irqsave (&lp->lock, flags); /* 1889 */switch(cmd)... case SIOCGIWPRIV: ... if(copy_to_user(wrq->u.data.pointer, …)) /* 2305 */
ret = -EFAULT;
clean
disable()
error
lock(l)
NoBlock
enable()
unlock(l)
Block call
400 lines later have this violation. This is a common pattern: implementor just doesn’t know about the rule, so keeps violating it. Happens: since rules manually enforced and poorly documented.
400 lines later have this violation. This is a common pattern: implementor just doesn’t know about the rule, so keeps violating it. Happens: since rules manually enforced and poorly documented.
Example: statically checking assert Assert(x) used to check “x” at runtime. Abort if
false– compiler oblivious, so cannot analyze statically– Use MC to build an assert-aware extension
Result: found 5 errors in FLASH. – Common: code cut&paste from other context– Manual detection questionable: 300-line path
explosion between violation and check
General method to push dynamic checks to static
msg.len = 0;...assert(msg.len !=0);
assert checker line 211:assert failure!
Another way to use MC is to push dynamic checks to static. Usually have some amount of dynamic type checking going on where you have a series of if statements at the beginning of your routine to check for error conditions. So just pull into the compiler and check.
Another way to use MC is to push dynamic checks to static. Usually have some amount of dynamic type checking going on where you have a series of if statements at the beginning of your routine to check for error conditions. So just pull into the compiler and check.
Check Errors False pos LOC
Static assert 5 0 100
Stack check 10+ 0 53
Allocation 184 64 60
Blocking 123 8 131
Module race ~75 2 133
Mutex 82 201 64
FLASH 34 69 553
Total ~545 ~226 ~1100
Result overview
(+others)
High bits: small number of LOC = big bug count, 2:1 ratio of false positivesHigh bits: small number of LOC = big bug count, 2:1 ratio of false positives
MC goal: make programming much more powerful– How: Raise compilation from level of programming
language to the “meta level” of the systems implemented in that language
MC works well in real, heavily tested systems– We found bugs in every system we’ve looked at. – Over 600 bugs in total, many capable of crashing
system– Easily written by people unfamiliar w/ checked system
Currently:– making correctors, using domain-knowledge to
extract verifiable specs, deriving errors by usage deviations, performing meta-level optimization…
Conclusions
Given a set of uses of some interface you’ve built, you invariably see better ways of doing things. This gives you a way to articulate this knowlege and have the compiler do it for you automatically. Let one person do it.
Given a set of uses of some interface you’ve built, you invariably see better ways of doing things. This gives you a way to articulate this knowlege and have the compiler do it for you automatically. Let one person do it.
Meta-level compilation: – Make compilers aggressively system-specific
– Easy: digest sentence fragment, write checker/optimizer.
– Result: Static, precise, immediate error diagnosis As outsiders found errors in every system looked at Over 600 bugs, many capable of crashing system
Currently:– making correctors, using domain-knowledge to
extract verifiable specs, deriving errors by usage deviations, performing meta-level optimization…
Conclusions
Given a set of uses of some interface you’ve built, you invariably see better ways of doing things. This gives you a way to articulate this knowlege and have the compiler do it for you automatically. Let one person do it.
Given a set of uses of some interface you’ve built, you invariably see better ways of doing things. This gives you a way to articulate this knowlege and have the compiler do it for you automatically. Let one person do it.
One way to find bugs:– have a deep understanding of code semantics, detect
when code makes no sense. Hard. Easier:
– see what code usually does: deviations probably bugs– x protected by lock(a) 1000 times, by lock(b) once,
probably an error
– Find inverses by looking for common pairings– More general: derive temporal orderings. Use
machine learning to derive more sophisticated patterns?
Bugs as deviant behavior
lock(a);x++; unlock(a);
lock(a);x++; unlock(a);
lock(a);x++; unlock(a);
lock(a);x++; unlock(a);
lock(b);x++; unlock(b);
lock(a);x++; unlock(a);
See system call polarity: return < 0 or > 0 on error: found 8 places in linux. If ever check a routine failure: make sure they always check for failure. See what they do after calling a security check (suser): if they do it a lot and deviate, whine. Weakness: always do wrong, never do right.
See system call polarity: return < 0 or > 0 on error: found 8 places in linux. If ever check a routine failure: make sure they always check for failure. See what they do after calling a security check (suser): if they do it a lot and deviate, whine. Weakness: always do wrong, never do right.
Static analysis works in some cases, not well in others– hit undecidable problems with loop termination
conditions, data values, pointers,…
Alternative: – use domain-specific slicing to extract spec from code– run through verifier
Main lever: a little domain knowledge goes a long way– e.g., strip out Linux TCP finite-state-machine by
keying off of variable “sk->state”– Real example: checking FLASH code
What to do when static analysis too weak?
Embedded sw for cache coherence in FLASH machine– errors crash or deadlock machine: can take week to
track– typical protocol: 18K lines of hairy C code
Extract specifications from source by simple slicing– found 9 errors in code– despite 5+ years of heavy testing and formal
verification! How?
– Given list of data structure fields and message operations, slice out all relevant operations
– Compose with specification (manual) boilerplate– run through Murphi model checker – Levers: aliasing and globals, but in a stylized way
that we can mostly ignore. 4 loops in code.
Extracting specs from FLASH code
Deeply nested control structures (29 conditional compilation directives, 21 if statements)Deeply nested control structures (29 conditional compilation directives, 21 if statements)
FLASH vs Murphi
nh.len := len_cacheline;if ((DH.Pending = 0)) then if ((DH.Dirty = 0)) then assert(nh.len != len_nodata); mbResult := pi_send_func(src, PI_Putt); DH.Local := 1; else assert((DH.List = 0)); assert((DH.RealPtrs = 0)); …
HANDLER_GLOBALS(header.nh.len) = LEN_CACHELINE;if (! HANDLER_GLOBALS(h.hl.Pending)) { if (! HANDLER_GLOBALS(h.hl.Dirty)) { ASSERT(!HANDLER_GLOBALS(h.hl.IO)); PI_SEND(F_DATA, F_FREE, F_SWAP,…,…); HANDLER_GLOBALS(h.hl.Local) = 1; /* ... deleted 14 lines */ } else { ASSERT(!HANDLER_GLOBALS(h.hl.List)); ASSERT(!HANDLER_GLOBALS(h.hl.RealPtrs));
Murphi
FLASH
Problem: big system, lots of bugs– may not be your system or take too long to fix
manually Can turn some classes of checkers into correctors:
– “Do not allocate large variables on kernel stack”: if you hit a violation, rewrite code to dynamically allocate var
– “Do not call blocking memory allocator with interrupts disabled”: hoist allocation out
– “On error paths, rollback side-effects”: dynamically track what these are, and reverse.
Interesting: trade dynamic checks for simplicity
Checkers into Correctors
cli();p = malloc(sizeof *p);sti();
tmp = malloc(sizeof *p);cli();p = tmp;sti();
Malloc: can insert null pointer check (if you can figure out right value to return) or can preallocate. Hoist mod_inc/dec above sleeping operations.
Malloc: can insert null pointer check (if you can figure out right value to return) or can preallocate. Hoist mod_inc/dec above sleeping operations.
Empirical observation: we have sent out hundreds of bug reports. 20-30 have gotten fixed.Empirical observation: we have sent out hundreds of bug reports. 20-30 have gotten fixed.
Optimization rules similar to checking:– “if data is not shared with interrupt handlers, protect
using spin locks rather than interrupt disable/enable– “to save an instruction when setting a message
opcode, xor it with the new and old (msg.opcode ^= (new ^ old));
– “replace quicksort with radix sort when sorting integers”
Common rule: “In situation X, do Y rather than Z”: – “if a variable is not modified, protect using read locks”
– and with a few lines: change opt into checker:
MC optimization
read_lock(q->lock);head = q->head;n = q->nelem;read_unlock(q->lock);
lock(q->lock);head = q->head;n = q->nelem;unlock(q->lock);
read_lock(q->lock);q->head = h;
“modifying q with read lock!”
Meaning more apparent + domain-specific knowlege
Easier to bound side-effects: use knowledge of abstract state to ignore many concrete actions
Aliasing less of a problem– typical: opaque handles vs normal mess of pointers
Operations more coarse grain– read()/write() vs load/store; matrix ops vs +/-
MC analysis vs. traditional compiler analysis
Bigint a, b, c;
set(a, 3);
mul(b, a, a);
mul(c, b, b);
printf(“%s”, bigint_to_str(c));
printf(“81”);
Interfaces let compilers treat implementation as a black box just like programmers!Interfaces let compilers treat implementation as a black box just like programmers!
Could imagine following bunches of pointers and memory allocation: instead, we know that it does a +, - whatever and we can ignore it. Similarly: ignoring ptrs.
Could imagine following bunches of pointers and memory allocation: instead, we know that it does a +, - whatever and we can ignore it. Similarly: ignoring ptrs.
Many compiler problems hard because static. Not a problem with runtime checking. Less control needed: pushing around function calls, rather than doing register allocation
Many compiler problems hard because static. Not a problem with runtime checking. Less control needed: pushing around function calls, rather than doing register allocation