an empirical study of advanced - mcmaster university
TRANSCRIPT
1
Advanced
Topics
Combining S. M. T
2
an empirical study of
predicate dependence levels
and trends
david binkley
mark harman
Icse 2003
Or …
How to use slicing to measure
testability
3
overview
evolutionary testing
variable dependence
empirical study programs
results
implications
4
DaimlerChrysler approach to test data generation
Test Evaluation
Test Execution Monitoring
Specification Program
Test O
rga
niz
at io
n
Test D
ocum
ent a
t ion
Test Planning
Test Case
Design
by Means
of EASelection
Reinsertion
Recombination
Mutation
Evaluation
the starting point for the study
was
evolutionary test data generation
5
TargetTarget
6
Target
Level 4
Level 3
Level 2
Level 1
control dependence analysis
TargetTargetTarget
approximation levelapproximation levelapproximation level
7
! fitness = approximation level + local distance
if A = B local distance = | A - B |Target
Level 4
Level 3
Level 2
Level 1
control dependence analysis
TargetTargetTarget
approximation levelapproximation levelapproximation level
local distance calculationlocal distance calculationlocal distance calculation
8
these are the inputs
but suppose we target this predicate … which variables matter?
but wait...
so we must keep yvariable d is not mentioned
so c is killed
of the original 7 variables only 3 matter
search space reduction is exponential
x, y, and z seem to be killed
search space reduction
9
foo(int x1,…,int xn)
…
if p(a1,x1,a3,g4,x4)
…
if q(a1,x5)
…
int a1,…,am;
int g1,…,gk;
q(a1,x5)
how many matter ?
how many matter ?
as n increases?
as k increases?
10
an empirical study
for a typical predicate
how much of input space is relevant ?
the question
why care?
reduce search space
implications for slicing… slice size for the predicate is closely related
comprehension
how easy is it to understand the decision ?
impact analysis
how much do inputs affect ?
measurement
how cohesive ?
11
inputs
formal parameters
globals in scope
du-globals
transitively defined or used in the predicate’s procedure
12
NameKloCPredicatesDescriptionreplace0.684Regular expression string replacementcopia1.245ESA signal processing codecompress1.988Data compression utilitywhich5.4136Unix utilitybarcode5.9365Barcode generatorspace11.5593ESA ADL interpretered13.62627Unix editorprepro14.8619ESA array pre-processing codebc16.8432Calculatorfindutils18.61819File finding utilitiesdiffutils19.82212File comparing routinesflex2-4-721.2911BSD scanner (version 2.4.7)flex2-5-421.51306BSD scanner (version 2.5.7)espresso22.12497logic simplification for CADijpeg28.22038JPEG compressorntpd45.63061daemon for the network time protocola2ps62.95984postscript formattergnugo81.75244GNU game playersendmail85.74991Linux mail daemonspice167.015527Digital circuit simulator
the programs studied
13
data collection
modified HRB slicing algorithm
implemented using CodeSurfer
formals and du-globals become parameters
thanks to GrammaTech for CodeSurfer
14
terminology
max parameters
parameters used
the maximum number of parameters a predicate could depend upon
the number of parameters a predicate actually depends upon
15
result data
three results to summarise in one data pointmax parameters
parameters used
number of predicates summarised
we adopted two diagramatic techniques
16
dependence skyline diagram
listed in ascending order of max formals
8 predicates have max formals of 3 and use 2.2 on average
skylines give a feeling for size reduction… but precise reading is lost
evident savings for prepro
notice the trend for dependent proportion to drop
replace prepro
17
dependence bubble chart
max parameters = parameters used
trend line
preproreplace formal parameters
angle shows the size reduction
x-axis: max formalsy-axis: formals usedbubble at (x,y) of size s …there are s predicateswhich have x max formals available to themthese s predicates depend on average on y formals
18
over all predicates
in
all programs
19
good news
as max formals increases
the dependent proportion falls
good news for evolutionary testing
and also for all the other applications
perhaps it is not good for cohesion
well almost
20
declining dependent proportionall predicates where max formals < 11
all predicates where max formals > 10
The
number of predicates
depended upon
as a function of
max formals available
is piecewise linear
21
declining dependent proportion
max formals < 11 max formals > 10
22
bad news
no such correlation for globals
globals could entail untestability usingsearch
also bad for other applications
is this more evidence that globals are bad
23
dependence trends
Max formals
Max du globals
24
implications
formals:
as the problem gets worse
the solution gets better
25
implications
globals:
large global variable lists
present problems
hard to generate test data
hard to understand
high levels of dependence
26
implications
cohesion:
functions with
large numbers of formals
may not be so cohesive
27
algorithm performance
the algorithm used has complexity O(P*S)
where P is the number of predicates and S is the
size of the procedure
PII 450 with 386Mb memory
analysis time per procedure = 0.017 * P * S ms
average analysis time 7.6ms per predicate
28
some possible
interpretations
what do you
read into
these diagrams ?
29
flex evolution of du globals
version 2.4.7 version 2.5.4
30
profiles
sendmail du globals findutils du globals
big bubbles top right big bubbles bottom left
31
conclusions
falling formal dependent proportion
invariant global dependent proportion
analysis is worth performing
diagrams may be an aid tounderstanding
evaluation
monitoring evolution
32
Testability Transformation
Overviewor…
How to use slicing and transformation to improve
testing
Test Data GenerationEvolutionary Test Data Generation
The Flag Problem
Flag Removal Algorithm
Initial Results
Testability Transformation
Other ‘non meaning-preserving’transformations
If time permits
33
Automatic Test Generation
We know that generating good quality test data ishard
and knowing what good quality means is hard
I do not propose to answer that question today
Starting point: structural test adequacy criterion
Specifically: that some branch is to be covered
and that we are going to use evolutionary testing
34
Target
Level 4
Level 3
Level 2
Level 1
Relevant branching statements can lead to a miss of
the desired target
! Fitness = Approximation_Level + Local_Distance
2. Local distance calculation in the branchingstatements with undesired branching
2.2. Local distance calculation in the branchingLocal distance calculation in the branching
statements with undesired branchingstatements with undesired branching
Evaluation of predicate in a branching condition in the
same manner as described for safety testing, e.g.
if A = B Local_Distance = max - (| A - B |)
Distance Based Fitness
1. Approximation level1. 1. Approximation levelApproximation level
TargetTargetTargetTargetTarget
35
0-10 10
10
Value of A
fitness
if (A==0) ...
Suppose we want to make this true
Max - (| A-0 |)10 - (| A-0 |)
flag = A==0;
if (flag) ...
Value of A
fitness
0-10 10
10
The Flag Problem
36
Flag Landscape
Large plateau of low fitness
Tiny plateau of high fitness
transform fitness function to transform landscapeTransform program to
Better
37
Informally
A transformation is a partial function on programs
which preserves meaning,of some kind or other
We need to pair the program and test adequacy criterion
– call this the test pair
A testability transformation is a partial function on test pairs
such that...
38
Testability Transformation
Test data
which
is
adequate for the transformed test pair
is
adequate for the original test pair
39
Testability Transformation Paradox
We are testing to cover structure
… but the structure is the problem
So we transform the program
… and this alters the structure
So we need to be careful:
Are we still testing according to the same criterion?
Our transformations will preserve coverage of
Statements branches MC/DC
Future work: define a semantics to verify this
40
This is not abstract interpretation
To preserve branch coverage:if (e) skip; else skip;
Cannot be transformed to skip;
But the program
if (e) x=1; else x=2;
Can be transformed to
if (e) skip; else skip;
41
Flag TT
Transform the program to remove flagsNot always possible
but worth doing where possible
Our Approach usesSimple transformations
Simple amorphous slicing
Substitute flag use with definition
Fairly well knownPerhaps less well known
Brief overview of amorphous slicing...
42
Flag Removal Transformation
flag = n < 4;
…
if (n%2==0) flag = 0;
…
if (a[i] != ‘0’ && flag)
...
flag = (n%2==0)?0:(n<4);n! = n;
flag = (n! %2==0)?0:(n! <4);
(n! %2==0)?0:(n! <4))
Suppose n is an unsigned integerWhat initial values of n will achieve this?Suppose we want to make flag true here
Once we have this
We can keep the original
flag assignment code
Claim:
Adding the new flag assignment leaves adequacy criteria invariant
43
Simple Flag Removal Algorithm
For loop free flag definition code
Bush
Blossom
Slice leaf sequences
Convert to conditional assignment
Add temporary variables
Substitute definition for use
44
r
p
q
a
c d
qq qq qq qq qq qq qq qq qq qq q
bb
r r r r r r r r r r rrr r r r
rrrr rrrrrrrrrrrrrrrrr
c
b
r
d dddd d d d d d d d
dddddddd
ddddddddddddddd
ddd
cc
cccccccc
ccc cccccccccccc
ccccc
cccc
ccccccccccccccccccccc
Bushing Produces a binary tree
45
p
a
c
q q
b
rrr
b
r
d dddc c c
'
b
c d
bbbbbbbbbbb
b bb bb b
b bb bb b
Blossoming Moves all actions to the leaves
All internal nodes will be predicates
Original predicates may be altered
46
p
a
q q
b
rr r
d dddc c c
'
c db b
dcc db b
aaaaa
'
aaaaaaaaa
aaa aa aa aa aa aa aa aa aa a
''''
a aa aa aa aa aa aa aa aa aa a a aa a a aa a a aa a a aa a a aa a a aa a a aa a a aa a a aa a a ac
rr''''
dcc db b
Blossoming is repeated for all internal action nodes
Some leaf assignments are not to the flag variable
… these can be removed using slicing
Now all internal nodes are predicates
47
p
q q
b
rr r
ddc c
'
dcc db b
'
''''
a a a a
r''''
dcc db bc d
ccccccccccac adac ad
Amorphous slicing gives single assignment at leaves
Now all leaves are single assignments
… and they all assign to the flag variable
Some predicates change several times during blossomingSometimes syntax-preserving slicing does this tooBut sometimes syntax-preserving slicing leaves several assignments
We assumed
freedom from
side effects
Fortunately we have a
side effect removal
transformation
48
Initial Empirical Analysis
Daimler ran their Evolutionary Test Data
Generator on both versions
Collected coverage information for 6 runs
49
/* date correction for september 1752 */if(special_days) result = "Day did not exist.";else if (leapflag && is_september && day>13) result = dayName((addMonths(month,year)+ (--day)+firstJanuary(year)+10)%7); else ...
/* date correction for september 1752 */if(year==1752 && month==9 && day>=3 && day<= 13) result = "Day did not exist.";else if (year==1752 && month==9 && day>13) result = dayName((addMonths(month,year)+ (--day+firstJanuary(year)+10)%7);
else ...
Special Values
Remove flags
With flags we never even got up here
With flags it took longer to get anywhere
50
returnflag = (a==0 || b==0 || c==0) || a>10000 || b>10000 || c>10000 || (c>=a+b) || (a>=b+c) || (b>=a+c);...
if (returnflag) return;
...
if (a==0 || b==0 || c==0) || a>10000 || b>10000 || c>10000 || (c>=a+b) || (a>=b+c) || (b>=a+c)
return;
Remove flagsThere is very little difference
Is it hard to find values which make this flag true?Is it hard to find values which make this flag false?
To have a problem we need
flag for which few inputs make it true/false
Nothing special flag
Let’s look at a bad flag problem ...
51
Special Value Flag
returnflag = (a==99999 || b==99999 || c==99999);...
if (returnflag) return;
...
if (a==99999 || b==99999 || c==99999) return;
There are relatively few values which make the flag trueSo we seldom randomly hit this statementThis predicate has a smoother landscape… than this one
So we get better coverage
… at less cost
52
Disposable Transformations
We generate test data using the transformed program
because it is easier
… then throw away the transformed program
Transformation as a means to an end not an end in itself
Do the transformations even need to preserve meaning?
This is a radically new form of transformation
53
Conclusion
Test data generation is hard
… anything which helps is good
Test data generation can be impeded by structure
… so transform the structure
We have to be sure to preserve branch coverage
… but not traditional meaning
This suggests a new kind of transformation
Testability Transformation
54
References
• David Binkley and Mark HarmanAnalysis and Visualization of Predicate Dependence on Formal Parameters and Global Variables.IEEE Transactions on Software Engineering. (journal version of ICSE 2003 paper, to appear)
• Mark Harman, Lin Hu, Rob Hierons, Joachim Wegener, Harmen Sthamer, Andre Baresel and MarcRoper.Testability Transformation.IEEE Transactions on Software Engineering.30(1): 3-16, 2004.
• David Binkley and Mark Harman.An Empirical Study of Predicate Dependence Levels and Trends25th IEEE/ACM International Conference on Software Engineering (ICSE 2003).3-10 May, 2003. Portland, Oregon, USA, Pages 330-339.
Electronic copies of these papers are available on my website.