static verification of memory safety for device drivers scott mcpeak george necula uc berkeley osq...
Post on 20-Dec-2015
213 views
TRANSCRIPT
Static Verification of Memory Safety for Device Drivers
Scott McPeak George Necula
UC Berkeley
OSQ Retreat, 5/15/03
Goal: Eliminate Low-level Bugs
• What properties?– Memory safety, including dangling references– Calling order restrictions (open/read, locks)
• Why?– Implicit spec– Shallow correctness proofs– Difficult to test and debug– Deallocation: an interesting heap assertion
Why Annotate?
• Exploit programmer knowledge– The information is there. Issue is convenience.
• Record design decisions– Would you program w/o typing declarations?
• Matter of cost/benefit– Annotation only needs to be more efficient than
testing and debugging (for these kinds of bugs)
Why is verification hard?
• Constructing models of the code– Choice of abstraction is very important
• Proving true predicates– Automatic vs. interactive theorem provers
• Knowing how to describe the program's heap invariants– Hard concepts: reachability, lack of sharing,
topmost in some set, acyclic, ...
Automatic Modeling
• Translate C into a language without pointers– Simulate them with sequences, etc.
• Example: uniform semantic modelM: int ! int[x = e] = x := [e][*p] = sel(M, p)[*p = e] = M := upd(M, p, [e])
Custom Semantic Models
• User provides the mapping from C as a set of pattern rules
• Can supply multiple mappings for the same syntax; choose by: type, name, module, etc.
• Can map from languages other than C
• Bridges gap between hand-constructed models and the real code
Example Models
• Java-like model [p!x] = sel(Foo_x, p) [p!x = e] = Foo_x := upd(Foo_x, p, [e])
• Functional model [cons(x,y)] = cons([x], [y]) [cdr(p)] = cdr([p])
class Foo { int x; int y;};
Integers
• 1. Simplify/simplex: rationals with integer heuristics– Practical, but unsound upon overflow
• 2. Same, but add no-overflow assertions– Sound, but perhaps little gain for effort
• 3. Integers with arithmetic mod 232
– For specialized circumstances
Semantic Model Soundness
• semantics: program ! abort / not abort– defines a language of non-aborting programs
• Need a trusted model, e.g. uniform semantics derived from C99 standard
• Custom model: uses higher-level concepts
• Soundness: custom µ trusted
Proving True Predicates
• Automatic theorem provers are incomplete– Our conclusion: not possible to avoid
incompleteness through clever choice of model
• Interactive provers are tedious
• Combination proof system– Try with automatic prover– If it fails, prove with interactive prover– Yield result as a new lemma for automatic
Describing Heap Invariants
• Label every object with a predicate name– At least one name for each type– Names for intermediate states, e.g. initialization– Break recursive invariants
• For every pointer, make a back pointer– Powerful, natural, local– Tree structure: only one back pointer– "Threaded heap"
Example: Threaded Heap
define global_inv() {
forall(Scull_Dev *p).
tag[p] == Scull_Dev_tag
==>
p->next!=NULL ==> p->next->prev == p;
/* ... */
}
Test case: "scull" driver
• An example driver; ~500 lines of C
• Extensive use of the heap, online allocation and deallocation
• Polymorphic use of file.private_data
• Array indices computed with div/mod
• Reactive (state transitions)
"scull" results
• Verified after two days of work:– Casts, array accesses, deallocations
• Two bugs– Incorrect interpretation of return value– Security hole: read another proc's old data
• As much annotation as code– Already have techniques to eliminate 75% of it
Future Work
• Continued improvement in annotations– Aggregation, data hiding in "changes" clauses– Built-in back pointers– Left half / right half approach to array loops– Type qualifiers to distribute predicates– User-written annotation agents– Split memory into regions– Incorporate other heap shape formalisms
Conclusion
• Reason for optimism in each problem area– modeling: user chooses the abstraction– proving: use interactive and automatic together– describing: predicate labels and back pointers
are a start
• The challenge is not one of technology, but of communication
(blank)
Vision
• Programmers know why their programs are (supposed to be) correct– Explanation will be in English, however
• Offer a way to conveniently express these reasons, then check them
• Make verification practical!
Our Approach
• Symbolic execution and strongest postcondition, with non-uniform semantics
• Explicit annotation at cutpoints: function boundaries, loop invariants
• Sound
• Linguistic innovation
Example Models
• Java-like model [p!x] = sel(Foo_x, p) [p!x = e] = Foo_x := upd(Foo_x, p, [e])
• Restricted form of interior pointers[y = f(&p!x)] = temp := f(sel(Foo_x, p))
y := first(temp);Foo_x := .. second(..) ..
• Functional model [cons(x,y)] = cons([x], [y])
class Foo { int x; int y;};
Basic Annotations
• Function pre- and postconditions– post can refer to pre-state values, return value
• Loop invariants
• void scribble_fives(int *p, int len) pre(0 < p < objct && 0 <= objsize[p] <= len) post(forall(int i). 0 <= i < len ==> mem[p,i]=5) changes(mem);
Annotation Extensions
• Global invariant: implicit in pre/post
• Automatic invariant strengthening
• Named predicates
• Aggregation, hiding for 'changes' clauses
• Left half / right half notation for arrays
• Predicates associated with type qualifiers
• User-written annotation assistants
Verification of scull
• Linux device driver, implements random-access files backed by RAM; ~500 lines
scull_devices
Allocation
• Key concept: allocation boundary
Mem
Foo_x
tag
0 objct
allocated
Role/Type tags
• tag: Addr ! int
• [p = (Foo *)malloc(sizeof Foo)] = p := objct; objct := objct + 1; tag := tag{objct := Foo_tag}
• C types are effectively first class in model
Role/Type tags
• Data structure invariants:8 p. sel(tag, p)=Scull_Dev_tag )
... p!next ...• Subtyping: sel(tag, p) <: My_Subclass_tag• Type-based disequality• Deallocation
[free(p)] = tag := upd(tag, p, 0)• Initialization
Threaded Heap
• Heap has a central spanning tree
• Invariant: for every tree pointer, the referent object names that pointer– using specification or ghost variables as needed
• Example: 8 p. sel(tag, p)=Scull_Dev_tag ) p!nextNULL ) p!next!prev = p
scull bugs
• Wrong return code interpretation if (pipe_init() > 0) { /* recover from error */ }
• Leak trusted kernel data p = kmalloc(4000); memcpy(p+i, src, len);
scull results
init 12 49 12.3 alloc/init top array
cleanup 8 50 9.0 dealloc top array
open 8 83 3.8 set private_data
close 1 4 0.7 file state to closed
read 15 330 18.5 int range checks
write 28 874 50.0 checks + data[] alloc
follow 7 46 6.0 list traversal, alloc
trim 38 461 24.5 dealloc list, data[]s
name paths preds time(s) description
117 1897 125.0
Future Work
• Annotate and verify more examples
• Implement a variety of abstraction mechanisms for annotation language
• Automated assistance for the edit-verify-diagnose cycle
• Prove the lemmas that we give to Simplify
Conclusion
• It is feasible to verify difficult properties (like lack of dangling refs) in real code
• Annotation burden is merely a symptom of inadequate annotation abstractions
• Prover's incompleteness can be overcome with lemmas proven with a more powerful prover