lazy systematic unit testing for java anthony j h simons christopher d thomson
TRANSCRIPT
Lazy Systematic Unit Testing for Java
Anthony J H Simons Christopher D Thomson
Overview
Lazy Systematic Unit Testing testing concepts and methodology
The JWalk Tester tool flagship of the JWalk 1.0 toolset
Dynamic analysis and pruning smart interactive generation and evaluation
Oracle building and test prediction building a test oracle with minimal user interaction
Head-to-head evaluation a testing contest: JWalk versus JUnit
http://www.dcs.shef.ac.uk/~ajhs/jwalk/
Motivation
State of the art in agile testing Test-driven development is good, but… …no specification to inform the selection of tests …manual test-sets are fallible (missing, redundant cases) Can we do better in test-case selection?
Regression testing : a touchstone? No specifications in XP, so use saved tests instead, which
become guarantors of correct behaviour Article of faith – passing saved tests guarantees no faults
introduced in the modified unit Actually no, state partitions cause geometric decrease in
effective state coverage (Simons, 2005)
Regression Testing Model
Base object proven correct by basic test set
Derived object refines Base object in some way
Basic test set used to test regression in Derived object
Passing regression tests proves that Derived conforms to Base
But this is an unreliable assumption!
Base
Derived
Btestproves
provesrefines
Test assumption: retesting “proves” compatible behaviour
conforms
Coverage of Base
Discharged
¬ isOnLoan()issue(a)
Issued
isOnLoan()
discharge()
borrower() / error borrower() / OK
isOnLoan()
Normal
new()
all pairs
validatedstateT2 = C (L0 L1 L2) P
Reach every state and validateevery transition pair
Coverage of Derived
Discharged
¬ isOnLoan()new()
issue(a)
Issued
isOnLoan()
discharge()
borrower() / error borrower() / OK
isOnLoan()
Normal
OnShelf
¬ reserved()
OnLoan
¬ reserved()
PutAside
reserved()
Recalled
reserved()discharge()
issue(a)
reserved()
reserve(b)
reserve(b)cancel()
cancel()
some pairs
reachedstateReusing the same T2 test-set
Test Regeneration Model
Base
Derived
Btest
Dtest
transitivelyconforms
proves
proves
refines
Only base object proven correct by basic test set
Derived object requires all-new tests, regenerated from derived specification
Derived object conforms to derived spec. by testing
Derived spec. conforms to base spec. by verification
Derived object conforms transitively to base spec.
New idea: conformity proven by both verification and testing
The Conundrum Regression testing is too weak
saved tests don’t exercise the refined model manual extra tests don’t cover all path combinations regression guarantee is progressively weakened
Test regeneration is more reliable all-new tests generated from a refined specification automatically generated tests cover all path combinations there is a guarantee of repeatable test quality (for Tk)
How to replicate for agile methods? No up-front specification from which to generate tests The only artifact is the evolving code, which changes Can we make any use of this?
Lazy Systematic Unit Testing
Lazy Specification late inference of a specification from evolving code semi-automatic, by static and dynamic analysis of code
with limited user interaction specification evolves in step with modified code
Systematic Testing bounded exhaustive testing, up to the specification emphasis on completeness, conformance, correctness
properties after testing, repeatable test quality
http://en.wikipedia.org/wiki/Lazy_systematic_unit_testing
JWalk Tester
Lazy systematic unit testing for Java static analysis - extracts the public API
of a compiled Java class protocol walk (all paths) – explores, validates all interleaved
methods to a given path depth algebra walk (memory states) – explores, validates all
observations on all mutator-method sequences state walk (high-level states) – explores, validates n-switch
transition cover for all high-level states
http://www.dcs.shef.ac.uk/~ajhs/jwalk/
Try me
Example: Stack
Analysis of the API (protocol, algebra)
Test reports for each test cycle
Test statistics and summary report
Load the Test Class
Choose a location the working directory the root of a package is
its parent directory
Choose a test class browse for the test class
within a directory browse for a package-
qualified class within a package
Shortcut type the (qualified) test
class name directly
Pick Settings and Go
Strategy protocol: all methods algebra: all constructions states: all states and
transitions Modality
inspect: the interface explore: exercise paths validate: against oracle
Test depth maximum path length
Start testing click on the JWalker to
run a test series
Protocol Inspection
Protocol analysis static analysis of the
public API of test class includes all inherited
public methods may/not include standard
Object methods specify this through the
custom settings
Algebraic Inspection
Algebraic analysis dynamic analysis of
algebraic categories primitive, transformer and
observer operations Technique
compares concrete object states
identifies unchanged, or re-entrant states
controlled by probe-depth and state-depth (custom)
State Inspection
State Analysis dynamic analysis of
high-level states automatically names
discovered states computes state cover
Technique based on public state
predicate methods seeks the boolean state
product (fails gracefully) controlled by probe-
depth
Baseline Approaches
Breadth-first generation all constructors and all interleaved methods (eg JCrasher,
DSD-Crasher, Jov) generate-and-filter (eg Rostra, Java Pathfinder) by state
equivalence class
Computational cost exponential growth, memory issues, wasteful over-
generation, even if filtering is later applied
#paths = Σc.mk, for k = 0..n
Key: c = #constructors, m = #methods, k = depth
Dynamic Pruning
Interleaved analysis generate-and-evaluate, pruning active paths on the fly (eg
JWalk, Randoop) remove redundant prefix paths after each test cycle, don’t
bother to expand in next cycle Increasing sophistication
prune prefix paths ending in exceptions (fail again) JWalk, Randoop (2007)
and prefixes ending in algebraic observers (unchanged) JWalk 0.8 (2007)
and prefixes ending in algebraic transformers (reentrant) JWalk 1.0 (2009)
Protocol Exploration
Protocol strategy explores all interleaved
methods by brute force explores all paths up to
length n (test depth) repeats invocations of
the same method Pruning
paths raising exceptions in test cycle i
are not extended in test cycle i+1
Baseline
newpush
top
pop
pushtop
poppush
top
pop
pushtop
pop
Key: novel state
exception
top
poptop
pop
top
pop
top
poptop
pop
push push
push
push
Brute-force, breadth-first exploration
pushtop
pushtop
pushtop
pop
pop
pop
top
pop
Prune Exceptions…
newpush
top
pop
pushtop
poppush
top
pop
pushtop
pop
Key: novel state
exception
top
poptop
pop
top
pop
top
pop
push push
push
Prune error-prefixes (JWalk0.8, Randoop)
top
pop
Algebraic Exploration
Algebraic strategy explores all algebraic
constructions grows paths using only
primitive operations observes paths ending in
any kind of operation Pruning
prunes paths ending in exceptions (next cycle)
also with re-entrant or unchanged states
Prune Observers
newpush
top
pop
pushtop
poppush
top
pop
pushtop
pop
Key: novel state
exception
unchanged state
pushtop
pop
Prune error- and observer-prefixes (JWalk0.8)
pop
top
…Transformers
newpush
top
pop
pushtop
pop
pushtop
pop
top
pop
Key: novel state
exception
unchanged state
reentrant state
Prune error-, observer- and transformer-prefixes (JWalk1.0)
State Exploration
State strategy reaches every high-level
state explores all transition
paths up to length n, from each state
has n-switch coverage Pruning
grows only primitive paths to reach all states
prunes paths ending in exceptions (next cycle)
Exploration Summary
Test settings test class, strategy,
modality, depth Exploration summary
# executed in total # discarded (pruned) # exercised (normal) # terminated (exception)
Technique calculates discarded
from theoretical max paths
The Same State?
Some earlier approaches distinguish observers, mutators by signature (Rostra) intrusive state equality predicate methods (ASTOOT) external (partial) state equality predicates (Rostra) subsumption of execution traces in JVM (Pathfinder)
Some algebraic approaches shallow, deep equality under all observers (TACCLE)
but assumes observations are also comparable very costly to compute from first principles
serialise object states and hash (Henkel & Diwan) but not all objects are serialisable no control over depth of comparison
State Comparison
Reflection-and-hash extract state vector from objects compute hash code for each field order-sensitive combination hash code
Proper depth control shallow or deep equality settings, to chosen depth hash on pointer, or recursively invoke algorithm
Fast state comparison each test evaluation stores posterior state code fast comparison with preceding, or all prior states possible to detect unchanged, or reentrant states
Pruning: Stack
Stack baseline except. observ. transf.
0 1 1 1 1
1 7 7 7 7
2 43 31 13 13
3 259 139 25 19
4 1555 667 43 25
5 9331 3391 79 31
Pruned: 9,300 redundant pathsRetained: 31 significant paths (best 0.33%)
Table 1: Cumulative paths explored after each test cycle
Pruning: Reservable Book
ResBook baseline except. observ. transf.
0 1 1 1 1
1 9 9 9 9
2 73 73 25 25
3 585 561 49 33
4 4681 4185 97 41
5 37449 memex 169 41
Pruned: 37,408 redundant pathsRetained: 41 significant paths (best 0.12%)
Table 2: Cumulative paths explored after each test cycle
Validation Modality
Lazy specification interacts with tester to
confirm key results uses predictive rules to
infer further results stores key results in
reusable test oracle Technique
key results found at the leaves of the algebra tree
apply predictions to other test strategies
Tester accepts or rejects outcome
Test Result Prediction
Semi-automatic validation the user confirms or rejects key results these constitute a test oracle, used in prediction eventually > 90% test outcomes predicted
JWalk test result prediction rules eg: predict repeat failure
new().pop().push(e) == new().pop()
eg: predict same state target.size().push(e) == target.push(e)
eg: predict same result target.push(e).pop().size() == target.size()
Try me
Kinds of Prediction
Strong prediction From known results, guarantee further
outcomes in the same equivalence class eg: observer prefixes empirically checked before making any
inference, unchanged state is guaranteed target.push(e).size().top() == target.push(e).top()
Weak prediction From known facts, guess further outcomes; an incorrect
guess will be revealed in the next cycle eg: methods with void type usually return no result, but may
raise an exception target.pop() predicted to have no result target.pop().size() == -1 reveals an error
Algebraic Validation
Algebraic testing grows all primitive paths
ending in all operations solicits results for leaves
of the algebra tree best mode in which to
create an oracle
Prediction predicts void-results predicts results saved in
previous test cycles
Oracle predicts a correct outcome
Tester confirms an outcome
Protocol Validation
Protocol Testing create oracle first using
the algebra-strategy then apply same oracle
in the protocol-strategy most results predicted!
Prediction (chains of) observers
don’t affect states re-entrant methods
return to earlier states
Oracle predicts many outcomes
State Validation
State testing extends oracle created
for the algebra-strategy can validate 1000’s of
transition paths for a mere few 10’s of
user confirmations Prediction
all results for “nearby” states predicted
needs confirmations for more “remote” states
Oracle predicts many outcomes
Validation Summary
Test summary other statistics as before
Validation summary # passed (in total) # failed (in total) # confirmed (by user) # rejected (by user) # correct (by oracle) # incorrect (by oracle)
10x automated vs manual checks
Amortized Interaction Costs
number of new confirmations, amortized over 6 test cycles con = manual confirmations, > 25 test cases/minute pre = JWalk’s predictions, eventually > 90% of test cases
Test class a1 a2 a3 s1 s2 s3
LibBk con 3 5 7 0 0 5
LibBk pre 2 8 18 18 38 133
ResBk con 3 14 56 0 11 83
ResBk pre 6 27 89 36 241 1649
eg: algebra-test to depth 2, 14 new confirmations
eg: state-test to depth 2, 241 predicted results
Feedback-based Methodology Coding
The programmer prototypes a Java class in an editor Exploration
JWalk systematically explores method paths, providing useful instant feedback to the programmer
Specification JWalk infers a specification, building a test oracle based on
key test results confirmed by the programmer Validation
JWalk tests the class to bounded exhaustive depths, based on confirmed and predicted test outcomes
JWalk uses state-based test generation algorithms
Example – Library Book
Exploration surprise: target.issue(“a”).issue(“b”).getBorrower() == “b” violates business rules: fix code to raise an exception
Validation all observations on chains of issue(), discharge() n-switch cover on states {Default, OnLoan}
public class LibraryBook { private String borrower; public LibraryBook(); public void issue(String); public void discharge(); public String getBorrower(); public Boolean isOnLoan();}
Extension – Reservable Book
Exploration only revisits novel interleaved permutations of methods surprise: target.reserve(“a”).issue(“b”).getBorrower() == “b”
Validation all obs. on chains of issue(), discharge(), reserve(), cancel() n-switch cover on states {Default, OnLoan, Reserved,
Reserved&OnLoan}
public class ReservableBook extends LibraryBook { private String requester; public ReservableBook(); public void reserve(String); public void cancel(); public String getRequester(); public Boolean isReserved();}
Evaluation
User Acceptance programmers find JWalk habitable they can concentrate on creative aspects (coding) while
JWalk handles systematic aspects (validation, testing) Main Cost is Confirmations
not so burdensome, since amortized over many test cycles metric: measure amortized confirmations per test cycle
Comparison with JUnit common testing objective for manual and lazy systematic
testing; evaluate coverage and testing effort Eclipse+JUnit vs. JWalkEditor: given the task of testing the
“transition cover + all equivalence partitions of inputs”
Comparison with
JUnit manual testing method Manual test creation takes skill, time and effort (eg: ~20 min
to develop manual cases for ReservableBook) The programmer missed certain corner-cases eg: target.discharge().discharge() - a nullop? The programmer redundantly tested some properties eg: assertTrue(target != null) - multiple times The state coverage for LibraryBook was incomplete, due to
the programmer missing hard-to-see cases The saved tests were not reusable for ReservableBook, for
which all-new tests were written to test new interleavings
Advantages of JWalk
JWalk lazy systematic testing JWalk automates test case selection -
relieves the programmer of the burdenof thinking up the right test cases!
Each test case is guaranteed to test a unique property Interactive test result confirmation is very fast (eg: ~80 sec in
total for 36 unique test cases in ReservableBook) All states and transitions covered, including nullops, to the
chosen depth The test oracle created for LibraryBook formed the basis for
the new oracle for ReservableBook, but… JWalk presented only those sequences involving new
methods, and all interleavings with inherited methods
Measuring the Testing?
Suppose an ideal test set BR : behavioural response (set) T : tests to be evaluated (bag – duplicates?) TE = BR T : effective tests (set)
TR = T – TE : redundant tests (bag)
Define test metrics Ef(T) = (|TE | – |TR |) / |BR| : effectiveness
Ad(T) = |TE | / |BR| : adequacy
Speed and Adequacy of Testing
Test goal: transition cover + equiv. partitions of inputs manual testing expensive, redundant and incomplete JWalk testing very efficient, close to complete
eg: wrote 104 tests, 21 were effective and 83 not!
eg: JWalk achieved 100% test coverage
Test class T TE TR Adeq time min.sec
LibBk manual 31 9 22 90% 11.00
ResBk manual 104 21 83 53% 20.00
LibBk jwalk 10 10 0 100% 0.30
ResBk jwalk 36 36 0 90% 0.46
Some Conclusions
JUnit: expert manual testing massive over-generation of tests (w.r.t. goal) sometimes adequate, but not effective stronger (t2, t3); duplicated; and missed tests hopelessly inefficient – also debugging test suites!
JWalk: lazy systematic testing near-ideal coverage, adequate and effective a few input partitions missed (simple generation strategy) very efficient use of the tester’s time – sec. not min. or: two orders (x 1000) more tests, for same effort
More Conclusions
Feedback-based development unexpected gain: automatic validation of prototype code c.f. Alloy’s model checking from a partial specification
Moral for testing automatically executing saved tests is not so great need systematic test generation tools to get coverage automate the parts that humans get wrong! let humans focus on right/wrong responses.
JWalk 1.0 Toolset
JWalk Tester JWalk Utility JWalk Editor
JWalk Marker JWalk Grapher JWalk SOAR
Example: JWalk Editor
© Neil Griffiths, 2008
Any Questions?
http://www.dcs.shef.ac.uk/~ajhs/jwalk/
Put me to the test!
© Anthony Simons, 2009, with help from Chris Thomson, Neil Griffiths, Mihai Gabriel Glont, Arne-Michael Toersel
Custom Configuration
Oracle directory default is the test class
directory; pick a new location
Convention standard: exclude all
of Object’s methods custom: include some complete: include all
Probe depth max path length for
dynamic analysis
State depth tree depth for object state
comparison shallow state (inc. array
values) by default
Generators
The heart of JWalk synthesise test input values on demand try to assure even spread of inputs for a given type by default, supply monotonic sequences of values
MasterGenerator built-in ObjectGenerator is fairly comprehensive synthesises basic values, arrays, standard objects, etc.
CustomGenerator take control of how particular types are synthesised provide custom generators; add to a master as delegates eg: StringGenerator, EnumGenerator, InterfaceGenerator
Custom Generators
Choose a location default is the test class
directory
Choose a generator enter generator directly browse within package
Click add/remove add a custom generator
to the list remove a generator from
the list
CustomGenerator Interface
Provide a generator class with:public boolean canCreate(Class<?> type);
public Object nextValue(Class<?> type);
public void setOwner(MasterGenerator master);
Key points: advertises which types it can synthesise generates a sequence of objects on demand may keep a handle to its owning master eg: InterfaceGenerator maps interface types onto concrete
classes; invokes nextValue recursively (on its master).
Example: IndexGenerator
public class IndexGenerator implements CustomGenerator {
private int seed = 1;private boolean flag = false;public boolean canCreate(Class<?> type) {
return type == int.class;}public Object nextValue(Class<?> type) {
if (flag) { flag = false; return seed++; }
else { flag = true; return seed; }}public void setOwner(MasterGenerator master) {}
}
Creates repeating pairs of indices
Specific for the int index type
Nullop: ignores the master generator
When are they Useful?
IndexGenerator generates repeating pairs of indices exercises put/get pairs in vector, array types
StdIOGenerator redirect System.in, System.out to conventional files test programs with IO using prepared data in files
FileGenerator take control of filenames and streams (security) test programs using prepared data in files
Arbitrary test set-up take control of how the environment is established