search-based testing · applications in mutation testing test suite prioritisation....
TRANSCRIPT
Search-Based TestingPhil McMinn
University of Sheffield, UK
OverviewHow and why Search-Based Testing Works
Examples
Search-Based Test Data Generation
Future Directions
Testability Transformation
Input Domain Reduction
Empirical and Theoretical Studies
Temporal, Functional, Structural
Applications in Mutation Testing
Test suite prioritisation
Acknowledgements
The material in some of these slides has kindly been provided by:
Mark Harman (KCL/UCL)
Joachim Wegener (Berner & Matner)
manual design of test cases / scenarios
Conventional Testing
manual design of test cases / scenarios
laborious,time-consuming
Conventional Testing
manual design of test cases / scenarios
laborious,time-consuming
tedious
Conventional Testing
manual design of test cases / scenarios
laborious,time-consuming
tedious
difficult!(where are the faults ???)
Conventional Testing
the hard part...
manual design of test cases / scenarios
laborious,time-consuming
tedious
difficult!(where are the faults ???)
Conventional Testing
Random Test Data Generation
Input
Random Test Data Generation
Input
Random Test Data Generation
Input
Search-Based Testing is an automated search of a
potentially large input space
the search is guided by a problem-specific ‘fitness function’
Search-Based Testing is an automated search of a
potentially large input space
the search is guided by a problem-specific ‘fitness function’
the fitness function guides the search to the test goal
Fitness Function
The fitness function scores different inputs to the system according to the test goal
Fitness Function
The fitness function scores different inputs to the system according to the test goal
which ones are ‘good’ (that we should develop/evolve further)
Fitness Function
The fitness function scores different inputs to the system according to the test goal
which ones are ‘good’ (that we should develop/evolve further)
which ones are useless (that we can forget about)
Fitness-guided search
Input
Fitn
ess
Fitness-guided search
Input
Fitn
ess
Fitness-guided search
Input
Fitn
ess
First publication on SBST
Webb Miller David Spooner
Automatic Generation of Floating-Point Test DataIEEE Transactions on Software Engineering, 1976
Winner of the 2009 Accomplishment by a Senior Scientist Award of the International Society for Computational Biology
2009 Time100Scientists and Thinkers
Publications since 1976
source: SEBASE publications repository http://www.sebase.org
International Search-Based Testing Workshop
co-located with ICST
http://sites.google.com/site/icst2011workshops/
http://sites.google.com/site/icst2011/
4th event at ICST 2011 next year
check websites for submission deadlines etc.:
International Symposium on Search-Based Software
Engineering
Benevento, Italy.7th - 9th September 2010
www.ssbse.org
International Symposium on Search-Based Software Engineering
2011: Co-location with FSE. Szeged, Hungary
www.ssbse.org
Phil McMinnGeneral Chair
Myra Cohen and Mel Ó Cinnéide Program Chairs
Fitness Functions
Often easy
We often define metrics
Need not be complex
Daimler Temporal Testing
tmin
tmax
t
dece
lera
tion
optimal point in time for
triggering the airbag igniter
Fitness = duration
J. Wegener and M. Grochtmann. Verifying timing constraints of real-time systems by means of evolutionary testing. Real-Time Systems, 15(3):275– 298, 1998.
Conventional testingmanual design of test cases / scenarios
laborious,time-consuming
tediousdifficult!
(where are the faults ???)
Conventional testingmanual design of test cases / scenarios
laborious,time-consuming
tediousdifficult!
(where are the faults ???)
Search-Based Testing:automatic - may sometimes be time consuming, but it is not a human’s time being consumed
Conventional testingmanual design of test cases / scenarios
laborious,time-consuming
tediousdifficult!
(where are the faults ???)
Search-Based Testing:automatic - may sometimes be time consuming, but it is not a human’s time being consumed
Search-Based Testing:a good fitness function will lead the search to
the faults
Generating vs CheckingConventional Software Testing Research
Write a method to construct test cases
Search-Based Testing
Write a method to determine how good a test case is
Generating vs CheckingConventional Software Testing Research
Write a method to construct test cases
Search-Based Testing
Write a fitness functionto determine how good a test case is
psi
gap
dist2space
space length
space width
Daimler Autonomous Parking System
Input
Daimler Autonomous Parking System
Fitness
Generation 0
Generation 0 Generation 10critical
Generation 0 Generation 10
Generation 20
collision
critical
Test setupUsual approach to testing
Software under test
Simulation environment
Manually generated input situations
Output of test
Test setup
Software under test
Simulation environment
Search Algorithm
Automatically generated inputs
outputs (converted to fitness values)
Search-Based Testing approach
Feedback loop
O Bühler and J Wegener: Evolutionary functional testing, Computers & Operations Research, 2008 - Elsevier
O. Buehler and J. Wegener. Evolutionary functional testing of an automated parking system. In International Conference on Computer, Communication and Control Technologies and The 9th. International Conference on Information Systems Analysis and Synthesis, Orlando, Florida, USA, 2003.
Structural testing
Structural testing
fitness function analysesthe outcome of
decision statements and the values of variables in
predicates
Structural testing
fitness function analysesthe outcome of
decision statements and the values of variables in
predicates
More later ...
Assertion testing
Assertion testing
assertion condition speed < 150mph
Assertion testing
assertion condition speed < 150mph
fitness function:f =150 - speed
Assertion testing
assertion condition speed < 150mph
fitness function:f =150 - speed
fitness minimised If f is zero or less a fault is found
Bogdan Korel, Ali M. Al-Yami: Assertion-Oriented Automated Test Data Generation. ICSE, 1996: pp. 71-80
Search Techniques
Hill Climbing
Input
Fitn
ess
Hill Climbing
Input
Fitn
ess
Hill Climbing
Input
Fitn
ess
Hill Climbing
Input
Fitn
ess
Hill Climbing
Input
Fitn
ess
No better solution in neighbourhood Stuck at a local optima
Hill Climbing - Restarts
Input
Fitn
ess
Hill Climbing - Restarts
Input
Fitn
ess
Hill Climbing - Restarts
Input
Fitn
ess
Hill Climbing - Restarts
Input
Fitn
ess
Hill Climbing - Restarts
Input
Fitn
ess
Hill Climbing - Restarts
Input
Fitn
ess
Hill Climbing - Restarts
Input
Fitn
ess
Hill Climbing - Restarts
Input
Fitn
ess
Simulated Annealing
Input
Fitn
ess
Simulated Annealing
Input
Fitn
ess
Simulated Annealing
Input
Fitn
ess
Simulated Annealing
Input
Fitn
ess
Simulated Annealing
Input
Fitn
ess
Simulated Annealing
Input
Fitn
ess
Simulated Annealing
Input
Fitn
ess
Worse solutions temporarily
accepted
Evolutionary Algorithm
Input
Fitn
ess
Evolutionary Algorithm
Input
Fitn
ess
Evolutionary Algorithm
Input
Fitn
ess
Evolutionary Algorithms
inspired by Darwinian Evolution and concept of survival of the fittest
Crossover
Mutation
Crossover
a b c10 10 20 40
d
a b c20 -5 80 80
d
Crossover
a b c10 10 20 40
d
a b c20 -5 80 80
d
Crossover
a b c10 10 20 40
d
a b c20 -5 80 80
d
Crossover
a b c10 10 20 40
d
a b c20 -5 80 80
d
a10
b10
c20 40
d
Crossover
a b c10 10 20 40
d
a b c20 -5 80 80
d
c80 80
d
a20
b-5
a10
b10
c20 40
d
d40
Mutation
a b c10 10 20
d40
Mutation
a b c10 10 20 20
d
d40
Mutation
a b c10 10 20 20
d40
d
d40
Mutation
a b c10 10 20 20
d40
da20
Evolutionary Testing
Evolutionary Testing
Mutation
Crossover
Selection
Insertion
Fitness Evaluation
End?
Evolutionary Testing
Mutation
Crossover
Selection
Insertion
Fitness Evaluation
End?
Test cases
Monitoring
Execution
Which search method?Depends on characteristics of the search landscape
Which search method?Depends on characteristics of the search landscape
Which search method?Depends on characteristics of the search landscape
Which search method?Depends on characteristics of the search landscape
Which search method?Depends on characteristics of the search landscape
Which search method?Some landscapes are
hard for some searches but easy for
others
...and vice versa...
Which search method?Some landscapes are
hard for some searches but easy for
others
...and vice versa...
Which search method?Some landscapes are
hard for some searches but easy for
others
...and vice versa...
Which search method?Some landscapes are
hard for some searches but easy for
others
more on this later...
...and vice versa...
Representation
Fitness Function
Neighbourhood
Ingredients for an optimising search
algorithm
Representation
Fitness Function
Neighbourhood
Ingredients for Search-Based Testing
Representation A method of encoding all possible inputs
Usually straightforward
Inputs are already in data structuresFitness Function
Neighbourhood
Ingredients for Search-Based Testing
Part of our understanding of the problem
We need to know our near neighbours
Representation
Fitness Function
Neighbourhood
Ingredients for Search-Based Testing
Transformation of the test goal to a numerical function
Numerical values indicate how ‘good’ an input is
Ingredients for Search-Based Testing
Representation
Fitness Function
Neighbourhood
More search algorithms
More search algorithms
Tabu Search
Estimation of Distribution Algorithms
Particle Swarm Optimisation
Ant Colony Optimisation
Genetic Programming
SBST Surveys & ReviewsPhil McMinn: Search-based software test data generation: a survey. Software Testing, Verification and Reliability 14(2): 105-156, 2004
Wasif Afzal, Richard Torkar and Robert Feldt: A Systematic Review of Search-based Testing for Non-Functional System Properties. Information and Software Technology, 51(6):957-976, 2009
Shaukat Ali, Lionel Briand, Hadi Hemmati and Rajwinder Panesar-Walawege: A Systematic Review of the Application and Empirical Investigation of Search-Based Test-Case Generation. IEEE Transactions on Software Engineering, To appear, 2010
Getting started in SBSEM. Harman and B. Jones: Search-based software engineering. Information and Software Technology, 43(14):833–839, 2001.
M. Harman: The Current State and Future of Search Based Software Engineering, In Proceedings of the 29th International Conference on Software Engineering (ICSE 2007), 20-26 May, Minneapolis, USA (2007)
D. Whitley: An overview of evolutionary algorithms: Practical issues and common pitfalls. Information and Software Technology, 43(14):817–831, 2001.
More applications of Search-Based Testing
Mutation Testing
original
Mutation Testing
original
mutant
mutant
Mutation Testing
mutant
mutant
Mutation Testing
mutant
mutant
higher order mutant
Mutation Testing
mutant
mutant
higher order mutant
Fewer Equivalent MutantsA. J. Offutt. Investigations of the software testing coupling effect. ACM
Transactions on Software Engineering and Methodology 1(1):5–20, Jan. 1992.
Finding Good HOMsDue to the large number of potential HOMs finding the
ones that are most valuable is hard
We want:
Subtle mutants
Reduced Test Effort
Finding Good HOMsDue to the large number of potential HOMs finding the
ones that are most valuable is hard
We want:
Subtle mutants
Reduced Test Effort
HOMs that are hard to kill, corner cases where
undiscovered faults reside
Finding Good HOMsDue to the large number of potential HOMs finding the
ones that are most valuable is hard
We want:
Subtle mutants
Reduced Test Effort
HOMs that are hard to kill, corner cases where
undiscovered faults reside
HOMs that subsume as many first-order mutants as
possible
Subtle Mutants -Fitness Function
ratio of the killability of the HOM to the killability of the constituent FOMs
> 1 HOM is weaker than FOMs
< 1 HOM is stronger than FOMs
= 0, potential equivalent mutant
Y. Jia and M. Harman. Constructing subtle faults using higher order mutation testing. In 8th International Working Conference on Source Code Analysis and
Manipulation (SCAM 2008), Beijing, China, 2008. IEEE Computer Society.
Time aware test suite prioritisation
K. R. Walcott, M. L. Soffa, G. M. Kapfhammer, and R. S. Roos. Time aware test suite prioritization. In Proc. ISSTA, pages 1–11, 2006.
Time aware test suite prioritisation
Suppose a 12 minute time budget
Time aware test suite prioritisation
Order by considering the number of faults that can be
detected
Time aware test suite prioritisation
Order by considering the time only
Time aware test suite prioritisation
Average % of faults detected
Time aware test suite prioritisation
Intelligent heuristic search
Time aware test suite prioritisation
Intelligent heuristic search
Same no. of faults found,
less time
Fitness FunctionThe tester is unlikely to know the location of faults
Need to estimate how likely a test is to find defects
% of code coverage is used to estimate a suite’s potential
the time taken to execute the test suitex
Results
Mutations used to seed faults
Multi-objective Search
Instead of combining the objectives into one fitness function, handle them as distinctive goals
Time Faults
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
Multi-objective SearchFitness Function A
Fitness Function B
The Pareto Front
Some ResultsPareto Efficient Multi-Objective Test Case
Selection, Shin Yoo and Mark Harman. Proceedings of ACM International Symposium on Software Testing and Analysis 2007 (ISSTA
2007), p140-150
Three objectives
Other applications of Search-Based Testing
Combinatorial Interaction Testing
GUI Testing
M.B. Cohen, M.B. Dwyer and J. Shi, Interaction testing of highly-configurable systems in the presence of constraints, International Symposium on Software Testing and Analysis
(ISSTA), London, July 2007, pp. 129-139.
X. Yuan, M.B. Cohen and A.M. Memon, GUI Interaction Testing: Incorporating Event Context, IEEE Transactions on Software Engineering, to appear
Other applications of Search-Based Testing
Stress testing
State machine testing
L. C. Briand, Y. Labiche, and M. Shousha. Stress testing real-time systems with genetic algorithms. In Proceedings of the Genetic and Evolutionary Computation Conference
(GECCO 2005), pages 1021–1028, Washington DC, USA, 2005. ACM Press.
K. Derderian, R. Hierons, M. Harman, and Q. Guo. Automated unique input output sequence generation for conformance testing of FSMs. The Computer Journal,
39:331–344, 2006.
Search-Based Structural Test Data
Generation
Covering a structure
TARGET
Fitness evaluation
TARGET
Fitness evaluation
TARGET
The test data executes the ‘wrong’ path
Analysing control flow
TARGET
Analysing control flow
TARGET
The outcomes at key decision statements
matter.
These are the decisions on
which the target is control dependent
Approach Level
TARGET
Approach Level
TARGET
= 2
Approach Level
TARGET
= 2
= 1
Approach Level
TARGET
= 2
= 1
= 0
Approach Level
TARGET
= 2
= 1
= 0
minimisation
Approach LevelRoy P. Pargas, Mary Jean Harrold, Robert Peck: Test-Data Generation
Using Genetic Algorithms. Softw. Test., Verif. Reliab. 9(4): 263-282 (1999)
Joachim Wegener, André Baresel, Harmen Sthamer: Evolutionary test environment for automatic structural testing. Information & Software Technology 43(14): 841-854 (2001)
Analysing predicatesApproach level alone gives us coarse values
a = 50, b = 0 a = 45, b = 5 a = 40, b = 10 a = 35, b = 15 a = 30, b = 20 a = 25, b = 25 !
Analysing predicatesApproach level alone gives us coarse values
a = 50, b = 0 a = 45, b = 5 a = 40, b = 10 a = 35, b = 15 a = 30, b = 20 a = 25, b = 25 !
getting ‘closer’ to being true
Branch distanceAssociate a distance formula with different
relational predicates
a = 50, b = 0 branch distance = 50a = 45, b = 5 branch distance = 40a = 40, b = 10 branch distance = 30a = 35, b = 15 branch distance = 20a = 30, b = 20 branch distance = 10a = 25, b = 25 branch distance = 0
getting ‘closer’ to being true
Branch distances for relational predicates
Nigel Tracey, John Clark & Keith Mander. The Way Forward for Unifying Dynamic Test-Case Generation:
The Optimisation-Based Approach.1998
Bogdan Korel. Automated Software Test Data Generation. IEEE Trans. Software Eng. 1990
Webb Miller & D. Spooner. Automatic Generation of Floating-Point Test Data. IEEE Trans. Software Eng. 1976
Putting it all together
true
true
if a >= b
if b >= c
TARGET
false
false
true if c >= d false
Fitness = approach Level + normalised branch distance
TARGET
Putting it all together
true
true
if a >= b
if b >= c
TARGET
TARGET MISSEDApproach Level = 2
Branch Distance = b - a
false
false
true if c >= d false
Fitness = approach Level + normalised branch distance
TARGET
normalised branch distance between 0 and 1indicates how close approach level is to being penetrated
Putting it all together
true
true
if a >= b
if b >= c
TARGET
TARGET MISSEDApproach Level = 1
Branch Distance = c - b
TARGET MISSEDApproach Level = 2
Branch Distance = b - a
false
false
true if c >= d false
Fitness = approach Level + normalised branch distance
TARGET
normalised branch distance between 0 and 1indicates how close approach level is to being penetrated
Putting it all together
true
true
if a >= b
if b >= c
TARGET
TARGET MISSEDApproach Level = 1
Branch Distance = c - b
TARGET MISSEDApproach Level = 2
Branch Distance = b - a
false
false
true if c >= d false
TARGET MISSEDApproach Level = 0
Branch Distance = d - c
Fitness = approach Level + normalised branch distance
TARGET
normalised branch distance between 0 and 1indicates how close approach level is to being penetrated
Normalisation FunctionsSince the ‘maximum’ branch distance is generally unknown we need a non-standard normalisation function
Baresel (2000), alpha = 1.001
Normalisation FunctionsSince the ‘maximum’ branch distance is generally unknown we need a non-standard normalisation function
Arcuri (2010), beta = 1
void fn( input1, input2, input3 .... )
Alternating Variable Method‘Probe’ moves
decrease
void fn( input1, input2, input3 .... )
increase
Alternating Variable Method‘Probe’ moves
decrease decrease
void fn( input1, input2, input3 .... )
increase increase
Alternating Variable Method‘Probe’ moves
decrease decrease decrease
void fn( input1, input2, input3 .... )
increase increase increase
Alternating Variable Method‘Probe’ moves
Alternating Variable Method
Input variable value
Fitn
ess
Accelerated hill climb
Alternating Variable Method
Input variable value
Fitn
ess
Accelerated hill climb
Alternating Variable Method
Input variable value
Fitn
ess
Accelerated hill climb
Alternating Variable Method
Input variable value
Fitn
ess
Accelerated hill climb
Alternating Variable Method
Input variable value
Fitn
ess
Accelerated hill climb
Alternating Variable Method
Input variable value
Fitn
ess
Accelerated hill climb
Alternating Variable Method
Input variable value
Fitn
ess
Accelerated hill climb
Alternating Variable Method1. Randomly generate start point
a=10, b=20, c=30
Alternating Variable Method1. Randomly generate start point
a=10, b=20, c=30
2. ‘Probe’ moves on a
a=9, b=20, c=30 no effect
a=11, b=20, c=30
Alternating Variable Method1. Randomly generate start point
a=10, b=20, c=30
2. ‘Probe’ moves on a
a=9, b=20, c=30 no effect
a=11, b=20, c=30
3. ‘Probe’ moves on b
a=10, b=19, c=30 improved
branch distance
Alternating Variable Method1. Randomly generate start point
a=10, b=20, c=30
2. ‘Probe’ moves on a
a=9, b=20, c=30 no effect
a=11, b=20, c=30
3. ‘Probe’ moves on b
a=10, b=19, c=30
4. Accelerated moves in direction of improvement
improved
branch distance
Key PublicationsBogdan Korel. Automated Software Test Data Generation.
IEEE Trans. Software Eng. 1990
Alternating variable method
Evolutionary structural test data generationS. Xanthakis, C. Ellis, C. Skourlas, A. Le Gall, S. Katsikas, and K. Kara- poulios.
Application of genetic algorithms to software testing (Applica- tion des algorithmes g ́en ́etiques au test des logiciels). In 5th International Conference on Software
Engineering and its Applications, pages 625–636, Toulouse, France, 1992
A search-based test data generator tool
Input Generation Using Automated Novel Algorithms
IGUANA(Java)
Test object
(C code compiled to a DLL )
IGUANA(Java)
Test object
(C code compiled to a DLL )
inputs
IGUANA(Java)
Test object
(C code compiled to a DLL )
inputs
information from test object
instrumentation
fitness computation
IGUANA(Java)
Test object
(C code compiled to a DLL )
inputs
information from test object
instrumentation
fitness computation
search algorithm
IGUANA(Java)
Test object
(C code compiled to a DLL )
inputs
information from test object
instrumentation
Java Native Interface
fitness computation
search algorithm
A function for testing
1. Parse the code and extract control dependency graph
TARGET
“which decisions are key for the execution of individual structural
targets” ?
if a == b
if b == c
return 1
Test Object Preparation
2. Instrument the code
TARGET
if a == b
if b == c
return 1
for monitoring control flow and variable values in
predicates
Test Object Preparation
3. Map inputs to a vector
Test Object Preparation
a b c
3. Map inputs to a vector
Test Object Preparation
a b c
Straightforward in many cases
3. Map inputs to a vector
Test Object Preparation
a b c
Straightforward in many casesInputs composed of dynamic data structures are harder to compose
Kiran Lakhotia, Mark Harman and Phil McMinn.Handling Dynamic Data Structures in Search-Based Testing.
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2008), Atlanta, Georgia, USA, July 12-16 2008, pp. 1759-1766, ACM Press.
Instrumentation
Instrumentation
Each branching condition is
replaced by a call to the function
node(...)
the instrumentation should only observe the program and not alter its behaviour
The first parameter is the control flow
graph node ID of the decision statement
The second parameter is a boolean condition that replicates the structure in the original program
(i.e. including short-circuiting)
Relational predicates are replaced with functions that compute branch distance.
The instrumentation tells us:
Which decision nodes were executed
and their outcome (branch distances)
Therefore we can find which decision control flow diverged from a target for an input....
... and compute the approach level from the control dependence graph
... and lookup the branch distance
fitness value for an input
TARGET
if a == b
if b == c
return 1
1nput: <20, 20, 30>NODE T F
1 0 1
2 10 0
4 20 0
....
Diverged at node 2approach level: 0
branch distance: 10
fitness = 0.009945219
TestabilityTransformation
The ‘Flag’ Problem
Program Transformation
Program Transformation
Programs will inevitably have features that heuristic searches handle less well
Mark Harman, Lin Hu, Rob Hierons, Joachim Wegener, Harmen Sthamer, Andre Baresel and Marc Roper. Testability Transformation.
IEEE Transactions on Software Engineering. 30(1): 3-16, 2004.
Testability transformation: change the program to improve test data generation
Programs will inevitably have features that heuristic searches handle less well
Mark Harman, Lin Hu, Rob Hierons, Joachim Wegener, Harmen Sthamer, Andre Baresel and Marc Roper. Testability Transformation.
IEEE Transactions on Software Engineering. 30(1): 3-16, 2004.
Testability transformation: change the program to improve test data generation
... whilst preserving test adequacy
Programs will inevitably have features that heuristic searches handle less well
Mark Harman, Lin Hu, Rob Hierons, Joachim Wegener, Harmen Sthamer, Andre Baresel and Marc Roper. Testability Transformation.
IEEE Transactions on Software Engineering. 30(1): 3-16, 2004.
Nesting
Phil McMinn, David Binkley and Mark Harman Empirical Evaluation of a Nesting Testability Transformation for Evolutionary Testing
ACM Transactions on Software Engineering and Methodology, 18(3), Article 11, May 2009
Testability Transformation
Testability Transformation
Note that the programs are no longer equivalent
Testability Transformation
Note that the programs are no longer equivalent
But we don’t care - so long as we get the test data is still adequate
Nesting & Local Optima
Nesting & Local Optima
Nesting & Local Optima
Nesting & Local Optima
-100
-80
-60
-40
-20
0
20
40
60
80
100
Nested branches
Chan
ge in
suc
cess
rate
afte
r app
lyin
g tr
ansf
orm
atio
n (%
)Results - Industrial & Open source code
Dependent & Independent Predicates
Predicates influenced by disjoint sets of input variables
IndependentCan be optimised
in parallel
e.g. ‘a == 0’ and ‘b == 0’
Dependent & Independent Predicates
Predicates influenced by non-disjoint sets of input
variables
DependentInteractions
between predicates inhibit
parallel optimisation
e.g. ‘a == b’ and ‘b == c’
-100
-80
-60
-40
-20
0
20
40
60
80
100
Nested branches
Chan
ge in
suc
cess
rate
afte
r app
lyin
g tr
ansf
orm
atio
n (%
)
Dependent predicates
-100
-80
-60
-40
-20
0
20
40
60
80
100
Nested branches
Chan
ge in
suc
cess
rate
afte
r app
lyin
g tr
ansf
orm
atio
n (%
)
Dependent predicates
Independent and some dependent predicates
When not preserving program equivalence can go wrong
we are testing to cover structure
... but the structure is the problem
so we transform the program
... but this alters the structure
we are testing to cover structure
... but the structure is the problem
so we transform the program
... but this alters the structure
so we need to be careful:are we still testing according to the same
criterion?
we are testing to cover structure
... but the structure is the problem
so we transform the program
... but this alters the structure
Input Domain Reduction
Mark Harman, Youssef Hassoun, Kiran Lakhotia, Phil McMinn and Joachim Wegener.The Impact of Input Domain Reduction on Search-Based Test Data Generation.
The 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/
FSE 2007), Cavtat, Croatia, September 3-7 2007, pp. 155-164, ACM Press.
Mark Harman, Youssef Hassoun, Kiran Lakhotia, Phil McMinn and Joachim Wegener.The Impact of Input Domain Reduction on Search-Based Test Data Generation.
The 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/
FSE 2007), Cavtat, Croatia, September 3-7 2007, pp. 155-164, ACM Press.
Mark Harman, Youssef Hassoun, Kiran Lakhotia, Phil McMinn and Joachim Wegener.The Impact of Input Domain Reduction on Search-Based Test Data Generation.
The 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/
FSE 2007), Cavtat, Croatia, September 3-7 2007, pp. 155-164, ACM Press.
Effect of Reduction-100,000 ... 100,000-100,000 ... 100,000
-100,000 ... 100,000
approx 1016
Effect of Reduction-100,000 ... 100,000-100,000 ... 100,000
-100,000 ... 100,000
Effect of Reduction-100,000 ... 100,000-100,000 ... 100,000
-100,000 ... 100,000
Effect of Reduction-100,000 ... 100,000-100,000 ... 100,000
-100,000 ... 100,000
200,001
Variable Dependency Analysis
Variable Dependency Analysis
Variable Dependency Analysis
Variable Dependency Analysis
Empirical Study
Random Search Alternating Variable Method Evolutionary Testing
Defroster F2
Gimp Spice
Tiff
Case studies:
Studied the effects of reduction with:
Effect on Random Testing-49 ... 50
Probability of executing target:
-49 ... 50-49 ... 50
100 x 100 x 1
100 x 100 x 100
Effect on Random Testing-49 ... 50
Probability of executing target:
-49 ... 50-49 ... 50
100 x 100 x 1
100 x 100 x 100
Results with Random Testing
Results with AVM
Effect on AVMSaves probe moves (and thus wasteful fitness evaluations) around irrelevant variables
void fn( irrelevant1, irrelevant2, irrelevant3, required1 .... )
increase increase increase increase
decrease decrease decrease decrease
Effect on AVMSaves probe moves (and thus wasteful fitness evaluations) around irrelevant variables
void fn( irrelevant1, irrelevant2, irrelevant3, required1 .... )
increase
decrease
Effect on ET
Saves mutations on irrelevant variables
Mutations concentrated on the variables that matter
Likely to speed up the search
Results with ET
Conclusions for Input Domain Reduction
Variable dependency analysis can be used to reduce input domains
This can reduce search effort for the AVM and ET
Perhaps surprisingly, there is no overall change for random search
Which searchalgorithm?
Empirical Study Bibclean Defroster F2 Eurocheck Gimp Space Spice Tiff Totinfo
Mark Harman and Phil McMinn.A Theoretical and Empirical Study of Search Based Testing: Local, Global and Hybrid Search.
IEEE Transactions on Software Engineering, 36(2), pp. 226-247, 2010.
760 branches in ~5 kLOC
Interesting branches
20 98
Alternating Variable Method
Evolutionary Testing
0
10
20
30
40
50
60
70
80
90
100
f2 F2 (
11F)
spac
e spa
ce_s
eqrot
rg (17
T)
spac
e spa
ce_s
eqrot
rg (22
T)
spice
clip_
to_cir
cle (4
9T)
spice
clip_
to_cir
cle (6
2T)
spice
clip_
to_cir
cle (6
8F)
spice
clipa
rc (22
T)
spice
clipa
rc (24
T)
spice
clipa
rc (63
F)
totinf
o Info
Tbl (14
F)
totinf
o Info
Tbl (21
F)
totinf
o Info
Tbl (29
T)
totinf
o Info
Tbl (29
F)
totinf
o Info
Tbl (35
T)
totinf
o Info
Tbl (35
F)
Suc
cess
Rat
e (%
)
Evolutionary Testing
Hill Climbing
Wins for the AVM
1
10
100
1,000
10,000
100,000
gimp_
rgb_to
_hsl
(4T)
gimp_
rgb_to
_hsv
(5F)
gimp_
rgb_to
_hsv
4 (11
F)
gimp_
rgb_to
_hsv
_int (1
0T)
gradie
nt_ca
lc_bil
inear_
factor
(8T)
gradie
nt_ca
lc_co
nical_
asym
_facto
r (3F
)
gradie
nt_ca
lc_co
nical_
sym_fa
ctor (
3F)
gradie
nt_ca
lc_sp
iral_f
actor
(3F)
clipa
rc (13
F)
clipa
rc (15
T)
clipa
rc (15
F)
TIFF_SetS
ample
(5T)
Aver
gage
num
ber o
f fitn
ess
eval
uatio
ns
Evolutionary Testing
Hill Climbing
Wins for the AVM
When does the AVM win?fit
ness
inputinput
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
chec
k_IS
BN (23F
)
chec
k_IS
BN (27T
)
chec
k_IS
BN (29T
)
chec
k_IS
BN (29F
)
chec
k_IS
SN (23F
)
chec
k_IS
SN (27T
)
chec
k_IS
SN (29T
)
chec
k_IS
SN (29F
)
Suc
cess
Rat
e
Evolutionary Testing
Hill Climbing
Wins for Evolutionary Testing
When does ET win?
The branches in question were part of a routine for validating ISBN/ISSN strings
When a valid character is found, a counter variable is incremented
When does ET win?Evolutionary algorithms incorporate
a population of candidate solutions
crossover
Crossover enables valid characters to be crossed over into different ISBN/ISSN strings
Schemata
1010100011110000111010
1111101010000000101011
0001001010000111101011
Subsets of useful genes
e.g. substrings of
1’s in a binary all
ones problem
Schemata
1010100011110000111010
1111101010000000101011
0001001010000111101011
The schema theory predicts that schema of above average fitness will proliferate in subsequent
generations of the evolutionary search
Subsets of useful genes
e.g. substrings of
1’s in a binary all
ones problem
Schemata
1**1 *11*
1111
crossover of fit schemata can lead to even fitter, higher order schemata
Royal Roads
landscape structures where there is a ‘red carpet’ for crossovers to follow
The Genetic AlgorithmRoyal Road
S1: 1111****************************S2: ****1111************************S3: ********1111********************S4: ************1111****************S5: ****************1111************S6: ********************1111********S7: ************************1111****S8: ****************************1111S9: 11111111************************S10: ********11111111****************S11: ****************11111111********S12: ************************11111111S13: 1111111111111111****************S14: ****************1111111111111111S15: 11111111111111111111111111111111
When Crossover Helps
When Crossover Helps Executes the target
R1 R2
Q1
P1 P2
Q2
P3 P4
Q1
P1 P2
Q2
P3 P4
When Crossover Helps
When Crossover Helps Executes the target
Contains 2 valid characters
Contains1 valid
character
Contains1 valid
character
Contains1 valid
character
Contains1 valid
character
Contains1 valid
character
Contains1 valid
character
Contains1 valid
character
Contains1 valid
character
Contains 2 valid characters
Contains 2 valid characters
Contains 2 valid characters
Contains 4 valid characters
Contains 4 valid characters
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
chec
k_IS
BN (23F
)
chec
k_IS
BN (27T
)
chec
k_IS
BN (29T
)
chec
k_IS
BN (29F
)
chec
k_IS
SN (23F
)
chec
k_IS
SN (27T
)
chec
k_IS
SN (29T
)
chec
k_IS
SN (29F
)
Suc
cess
Rat
e
Evolutionary Testing
Headless Chicken Test
Headless Chicken TestT. Jones, “Crossover, macromutation and population-based search,” Proc. ICGA ’95. Morgan Kaufmann, 1995, pp. 73–80.
Investigations into Crossover
Royal Roads
HIFF
Real Royal Roads
Ignoble Trails
M. Mitchell, S. Forrest, and J. H. Holland, “The royal road for genetic algorithms: Fitness landscapes and GA performance,” Proc. 1st European Conference on Artificial Life. MIT Press, 1992, pp. 245–254.
R. A. Watson, G. S. Hornby, and J. B. Pollack, “Modeling building-block interdependency,” Proc. PPSN V. Springer, 1998, pp.97–106.
T. Jansen and I. Wegener, “Real royal road functions - where crossover provably is essential,” Discrete Applied Mathematics, vol. 149, pp. 111–125, 2005.
J. N. Richter, A. Wright, and J. Paxton, “Ignoble trails - where crossover is provably harmful,” Proc. PPSN X. Springer, 2008, pp. 92–101.
Evolutionary Testing Schemata
{(a, b, c) | a = b}
{(a, b, c) | a > 0}
(50, 50, 25)(100, 100, 10)
...
(50, 10, 25)(100, -50, 10)
...
Crossover of good schemata
{(a, b, c) | a = b} {(a, b, c) | b ≥ 100}
{(a, b, c) | a = b ∧ b ≥ 100}
Crossover of good schemata
{(a, b, c) | a = b} {(a, b, c) | b ≥ 100}
{(a, b, c) | a = b ∧ b ≥ 100}
subschema subschema
superschema
Crossover of good schemata
{(a, b, c) | a = b} {(a, b, c) | b ≥ 100}
{(a, b, c) | a = b ∧ b ≥ 100}
subschema subschema
superschema
building block building block
Crossover of good schemata
{(a, b, c) | a = b ∧ b ≥ 100} {(a, b, c) | c ≤ 10}
{(a, b, c) | a = b ∧ b ≥ 100 ∧ c ≤ 10}
Crossover of good schemata
{(a, b, c) | a = b ∧ b ≥ 100} {(a, b, c) | c ≤ 10}
{(a, b, c) | a = b ∧ b ≥ 100 ∧ c ≤ 10} coveringschema
What types of program and program structure enable Evolutionary Testing to perform well, through crossover and how?
P. McMinn, How Does Program Structure Impact the Effectiveness of the Crossover Operator in Evolutionary Testing? Proc. Symposium on Search-Based Software Engineering, 2010
1. Large numbers of conjuncts in the input condition
{(a, b, c...) | a = b ∧ b ≥ 100 ∧ c ≤ 10 ...
each represents a ‘sub’-test data generation problem that can be solved independently and
combined with other partial solutions
P. McMinn, How Does Program Structure Impact the Effectiveness of the Crossover Operator in Evolutionary Testing? Proc. Symposium on Search-Based Software Engineering, 2010
2. Conjuncts should reference disjoint sets of variables
{(a, b, c, d ...) | a = b ∧ b = c ∧ c = d ...
the solution of each conjunct independently does not necessarily result in an overall solution
What types of program and program structure enable Evolutionary Testing to perform well, through crossover and how?
Progressive Landscapefit
ness
input input
Crossover - Conclusions
Crossover lends itself to programs/units that process large data structures (e.g. strings, arrays) resulting in input condition conjuncts with disjoint variables
1. Large numbers of conjuncts in the input condition 2. Conjuncts should reference disjoint sets of variables
Crossover - Conclusions
Crossover lends itself to programs/units that process large data structures (e.g. strings, arrays) resulting in input condition conjuncts with disjoint variables
1. Large numbers of conjuncts in the input condition 2. Conjuncts should reference disjoint sets of variables
... or units that require large sequences of method calls to move an object into a required state
Crossover - Conclusions
Crossover lends itself to programs/units that process large data structures (e.g. strings, arrays) resulting in input condition conjuncts with disjoint variables
1. Large numbers of conjuncts in the input condition 2. Conjuncts should reference disjoint sets of variables
... or units that require large sequences of method calls to move an object into a required state
e.g. testing for a full stack - push(...), push(...), push(...)
Other Theoretical Work
A. Arcuri, P. K. Lehre, and X. Yao, “Theoretical runtime analyses of search algorithms on the test data generation for the triangle classification problem,” SBST workshop 2008, Proc. ICST 2008. IEEE, 2008, pp. 161–169.
A. Arcuri, “Longer is better: On the role of test sequence length in software testing,” Proc. ICST 2010. IEEE, 2010.
Future directions...
The Oracle ProblemDetermining the correct output for a given input is called the oracle problem
Software engineering research has devoted much attention to automated oracles
... but many systems do not have automated oracles
Human Oracle Cost Typically the responsibility falls on the human
Reducing human effort, the human oracle cost remains an important problem
Test data generation and human oracle cost
generate test data that maximises coverage but minimises the number of test cases
reduce size of test cases
Quantitative
Test data generation and human oracle cost
Quantitative generate test data that maximises coverage but minimises the number of test cases
reduce size of test casesA. Leitner, M. Oriol, A. Zeller, I. Ciupa, and B. Meyer.
Efficient unit test case minimization. ASE 2007, pp. 417–420. ACM.
M. Harman, S. G. Kim, K. Lakhotia, P. McMinn, and S. Yoo. Optimizing for the number of tests generated in search based test
data generation with an application to the oracle cost problem. 3rd Search-Based Software Testing workshop (SBST 2010), 2010. IEEE
digital library.
Test data generation and human oracle cost
Qualitative e.g. how easily can the scenario comprising a test case be understood so that the output can be evaluated ?
Test data generation and human oracle cost
Qualitative e.g. how easily can the scenario comprising a test case be understood so that the output can be evaluated ?
Test data generation and human oracle cost
Qualitative e.g. how easily can the scenario comprising generated test databe understood so that the output can be evaluated ?
Phil McMinn, Mark Stevenson and Mark Harman.Reducing Qualitative Human Oracle Costs associated with Automatically Generated Test Data.
Proceedings of the 1st International Workshop on Software Test Output Validation (STOV 2010), to appear.
Calendar program
Takes two dates (represented by 3 integers each)
Finds the number of days between the dates
Calendar program
-4048/-10854/-29141 10430/3140/6733 3063/31358/8201
Some example dates generated:
Machine-generated test data tends to not fit the operational profile of a program particularly well
Seeding knowledgeProgrammer test case used as the starting point of the search process
16/1/20102/1/20092/32/2010
In what ways can operational profile knowledge be obtained?
Programmers
The Program
In what ways can operational profile knowledge be obtained?
Likely to have run the program with a sanity check
These can be seeded to bias the test data generation process
Programmers
The Program
In what ways can operational profile knowledge be obtained?
Programmers
The Program
Identifier names
Code comments
Sanitisation routines
Identifier namesGive clues to the sort of inputs that
might be expected
dayOfTheWeek
country_of_origin
url
Can be analysed in conjunction with large-scale natural language lexicons such as WordNet
Sanitisation Routines
Defensive programming routines might be used to ‘correct’ a program’s own test data
Test Data Re-use
Test Data Re-use
Program similarity and test data re-use
Code structure
Identifier names
Code comments
clone detection
plagiarism detection
Will these techniques work?
Will they compromise fault-finding capability?
Questions &Discussion