1
Today
More on SPIN and verification (using instrumentation to handle tracking problems and check properties)
Using SPIN for both model checking and “pure” random testing• How do they compare?
Abstraction, state matching, symbolic execution, random testing, and software model checking (whew!) for coverage with Java PathFinder
2
Model Checking and Dynamic Analysis
(Software) model checking• (In principle exhaustive) exploration of a
program’s state space
Dynamic analysis• Analysis of a running program (Ball, FSE ‘99)• Instrumentation or execution in virtual
environment – e.g. Valgrind• Testing is a dynamic analysis: program is
executed in order to learn about its behaviors
3
Key Limitations of Model-Driven VerificationEmbedded C code is always executed as an
atomic step• SPIN cannot check properties (such as invariants)
within the embedded code• SPIN cannot interrupt control flow to simulate an
interrupt or system reset• Very easy to make mistakes in tracking C memory
state; very hard to debug
Unsound abstractions: need to understand/control
Use dynamic analysis (instrumentation) to address
4
Extending Model Checking with Dynamic Analysis (VMCAI 08)Simply instrument the program to be model
checked• Use CIL to perform source-to-source
modification• Compile the instrumented program• Link the model checking harness to the
instrumented version of the codeProgram to bemodel checked
CIL module(instrumentation)
Instrumented program
PROMELAharness
Model checker withdynamic analysis
Model checkersource (pan.c)
(compile & link)
5
Extending Model Checking with Dynamic Analysis
Most common application: instrument every write to global memory (any address not on the stack)
Integration of dynamic analysis with model checking for testing in order to:• Check modifies clauses and other properties• Simulate system reset (for spacecraft code)• Debug model checking harness• Compute fine-grained coverage measures• Introduce novel search heuristics
6
Extending Model Checking with Dynamic AnalysisKey benefits:
• Instrumented code runs at native speed• Easy to use – just extend CIL hook function
called on every global memory access• Property checks can also be used during
testing (ICSE 07), or as runtime monitors in deployed code (Klaus!)
• Relatively low overhead: model checker is not instrumented, but storing and matching states (the DFS) takes most of the execution time
• Still get all of SPIN’s benefits – extremely efficient model checker
7
Applications: Checking Modifies Clauses
Modifies clauses• Specify which variables a function may alter• Used in Larch, ESC/Java, and JML specs• More generally: which memory locations a
block of code is permitted to modify
modifies (p, q, *x, list)
int i;p = 3; /* ok */q = a + 7; /* ok */*x = q; /* ok */i = 9; /* ok – i is stack local */ x = &p; /* NOT OK – only *x */
8
Applications: Checking Modifies Clauses
Used to check memory access properties of a safe replacement for the C string library used in flight file system
Also used to debug and troubleshoot memory specifications for SPIN harness• Warns when memory that is not tracked or
matched is modified by C code• Can specify that some memory is known to be
safely ignored• Greatly improves the ease-of-use of model-
driven verification with SPIN
9
Applications: Checking Modifies ClausesExecute C code
until controlreturns to SPIN
Push tracked & matchedstate on stack
Has state beenvisited before?
Store matchedstate in state
table
Backtrack:pop stack &
restore tracked &matched state
Y N
Current State
Old State
Current StateOld State
“hybrid” state not actuallyreachable in program execution
Backtrack:Backtrack:pop stack &pop stack &
restore tracked &restore tracked &matched statematched state
very hard to debug!
should have trackedthis, too!
10
Applications: Simulating Warm Resets
In a warm reset, all data on the program stack is cleared and the software is reset to an entry location, but global data is not cleared
Used in JPL flight software –• RAM file system for data products (images and
science data) that survives system reboot
Data may be corrupted• Need to model check to verify that corruption is
detected and data is preserved across reset
11
Applications: Simulating Warm Resets
Use CIL instrumentation to decrement a reset counter (set by the SPIN harness) at every access to global memory
If counter has reached zero, simulate a reset by jumping back to SPIN, after (possibly) corrupting memory location
Found a very subtle bug: checksum was too weak to detect some corruptions
Also used this instrumentation in random testing of the module
12
Applications: Computing Coverage
Often want to measure the coverage of some abstraction of the program state space• For example, in flight flash file systems,
the state of the flash device
Used page
Free page
Dirty page
Bad block
live pages? (0-1) x dirty pages (0-1) x block state (bad, free, current)
13
Applications: Computing Coverage
Easy to compute the abstract state after each call from the harness (write, compact, etc.), and give coverage• But this is only the “stable states”• Flash operation may put the device into many
intermediate states• Use CIL to instrument code to find the full
coverage by checking after every call to the flash device driver
14
Applications: Coverage-based Heuristics
CIL instrumentation computes a bit vector indicating the local control path of a function – i.e., this code
if (x == 3) { x++; if (y > 0) { y++; }} else { x--;}
becomes
if (x == 3) { add_to_bv(pathBV, 1); x++; if (y > 0) { add_to_bv(pathBV, 1); y++; } else { add_to_bv(pathBV, 0); }} else { add_to_bv(pathBV, 0); x--;}
15
Applications: Coverage-based Heuristics
SPIN matches on pathBV (which is limited to some finite size)
Clear pathBV before each entry into the C code from the model checking harness
SPIN will not backtrack if the path through the called code is new, even if the state otherwise matches
Model checking augmented to greedily improve path coverage
16
Applications: Coverage-based Heuristics
Why is this useful?• Because when checking rich properties of
complex programs, we use unsound abstractions to reduce the state space to a manageable size
• Matching on local path causes SPIN to explore states that induce new paths, even if they appear to be the same under the abstraction
Applied to a 5K line critical storage module to be used in upcoming missions• Tracking only pathBV, with enough bits, covered
the abstract state space nearly as well as – or better than – matching on the abstract state!
17
Reachability for pathBV Search
Reachability results
0
10
20
30
40
50
60
0 5 10 15 20 25
Bits of path information
Ab
stra
ct f
lash
sta
tes
Coveragereachedwhenmatchingon abstractstate
(not muchinformationhere – thispart is likelyan artifactof DFS “luck”)
18
Reachability for pathBV Search
Reachability results
0
50
100
150
200
250
0 5 10 15 20 25
Bits of path information
Wri
te-o
rder
ing
sta
tes
Fewerbits, butmorecoverage!
Same model,another(unsound)abstraction
19
Applications: Coverage-based Heuristics
Preliminary results• We need to apply this to more large models• Further explore use in combination with
unsound abstractions• Known that DFS exploration can be highly
sensitive to search ordering (Dwyer et al., FSE 2006), so it is hard to draw conclusions yet
• Early results for detecting seeded errors are less impressive – performs worse than matching abstractions, even when it gets better coverage
20
Managing Unsound AbstractionsComputing coverage and guiding the
search based on structural properties:• Examples of using dynamic analysis to
“manage” – understand and “improve” – the unsound abstractions we often have to use
• With large state spaces and our properties too complex for automated sound abstraction in flight software), we have to use techniques like those used to understand a testing effort
• What “happened” during these runs?• What did we cover?• What didn’t we cover?• Can we cover a bit more, without changing the
abstraction?
21
Uninstrumented
Program Time SPIN Time SPIN Check Slowdown Type
NVDS-1 123.8 95.0% 137.1 88.0% 5.20% 10.50% track
NVDS-1 (bitstate) 581.9 93.0% 621.3 86.0% 3.03% 6.80% rack
NVDS-2 437.4 93.0% 490.8 89.7% 2.08% 12.20% pathBV(20)
Launchseq 97.6 99.0% 98.3 98.0% 0.06% 0.70% track
n_strncpy 34.6 99.5% 34.9 99.4% 0.22% 0.86% modifies
n_strncat 29.3 99.6% 29.4 99.5% 0.04% 0.34% modifies
Program Time Test Time Test Check Slowdown Typestringtest 202.9 80.0% 250.3 41.1% 24.40% 23.40% modifies
Instrumented
Testing
Model Checking
Overhead for Dynamic Analysis
Slowdown for model checking never exceeds 12.2%, often below 1%
Uninstrumented SPIN code dominates execution time
For testing, overhead is higher as test engine is less expensive relativeto program execution
In general, very low overhead – lower, perhaps, than is typical with dynamic analysis during testing, because the model checking is not instrumented but consumes most of the execution time; we find this approach very practical with flight software models
22
Conclusions
Automatic instrumentation is very useful in conjunction with model checking – as it is in testing and monitoring• Useful and easy to apply• Relatively low overhead• Can apply a dynamic analysis exhaustively to
entire program state space
Useful for managing the abstractions we use to manage our state space
23
Using SPIN for True Random Testing
Want to apply both methods – graphs are too large to explore competely• Is a random walk better or something
more systematic? Hard to be sure
Would be nice to not write two testers – one for random testing, one for state-based testing
Basic harness looks the same, property checks look the same, etc.
24
The pick Macro, Revisitedinline pick (var, MAX)
var = 0;
do
:: (var < MAX) -> var++
:: break
od
What if we change pick?
25
The pick Macro, Revisitedinline pick (var, MAX) {
if:: ! initialized -> nondet_pick(seed, SEED_RANGE); c_code{ printf (“Test with seed %d\n”, now.seed); srandom(now.seed); }; initialized = 1:: else -> skipfi;var = c_expr{random()} % MAX;
}
To this?
26
Coverage of nvds_box.c
78
79
80
81
82
83
84
85
86
87
0 50 100 150 200
Minutes
% C
ove
rag
e
Model Checking
Random Testing
27
Coverage of nvfs_pub.c
75.2
75.25
75.3
75.35
75.4
75.45
75.5
75.55
0 50 100 150 200
Minutes
% C
ove
rag
e
Model Checking
Random Testing
28
Coverage of flash abstraction
0
10
20
30
40
50
60
70
0 50 100 150 200
Minutes
Ab
stra
ct s
tate
s co
vere
d
Model Checking
Random Testing
29
Coverage of page abstraction
0
5
10
15
20
25
30
35
40
0 50 100 150 200
Minutes
Ab
stra
ct s
tate
s co
vere
d
Model Checking
Random Testing