testing: not just error detection helps to evaluate actual performance of the system need to test...
TRANSCRIPT
Testing: not just error detection
Helps to evaluate actual performance of the system
Need to test specifications and assumptions about the environment
Need to validate the performanceCompare actual performance to the
worst-case analysisCompare actual performance to the
expected performanceCannot always test the system!
E.g. recovery actions for a power plan failure
Simulation techniques try to imitate the environment in such cases
There are limits to what simulations can do
When should testing start?
When the system is fully implemented?NO!
Many potential problems that designers and coders were aware of during development are forgotten
The testing task is just too formidable at this point
The first rule of testing (D. Hamlet, J. Maybee)
Get out the test plan and follow itMany aspects that need to be tested
arise at requirements and design stages•They need to be documented•Testing of these aspects should be thought through
What should testing accomplish?
Software testing has to verify that software meets its requirements
Too naïve!Cannot verify!
E.g. a requirement can say “for all inputs, the program will…”
There may be infinitely many possible inputs
Realistically, software testing has to find failures
Important terminology (IEEE standards)
FailureThe software does something contrary to
its specificationFault
Something in the program code from which a failure can arise
What about bug and defect?They usually refer to faults
You observe failures and find faults
How does testing work?
It’s not a simple question. The answer depends on many parameters
The nature of the module being testedE.g. scientific computations vs. UI components
What type of validation we are looking forFunctionality
• Behaves correctly• Looks correctly
PerformanceInteroperability
The way that testing is performedViewing the module being tested as a black or
white box The goals of testing
Looking for bugsProving to the users that the system “works”
…
How does testing work? (cont.)
Object of testing
Run-time environment
Inputs
Outputs
Observations
Test specification Oracle
Outcome of testing
Types of testing, depending on purpose of testingUnit (or module) testing
Individual components are tested independently
E.g. class testingE.g. module testing
Integration testingTesting interfaces between several
subsystemsSystem testing
Testing the complete systemAcceptance testing
Testing with the data supplied by the usersMay be done in the presence of users
Regression testingTesting an incremental version of the system
How do we select test cases
By the number of test casesE.g. N random test cases
Based on some properties of the system
By their ability to detect faults
Test selection criteria
There is an abstract domain D of all possible test cases (all possible inputs to the program or module)
Let T be a subset of DA test selection criterion is a predicate
that specifies whether test set T is in some sense “enough” to test the program or module
Two uses for testing criteria:Stopping rule - know when the system has
been tested enoughTest data evaluation rule - evaluates
quality of selected test casesSeveral testing criteria may be used at
the same time
Ideal test selection criterion
A test selection criterion is ideal if for any test set T that satisfies this criterion, T detects all errors, if any, in the program/module
Of course, it is desirable that T is manageable in size, so that testing does not take forever
In general, only the test criterion that requires running all tests in D is ideal
A test selection criterion can be useful even if not idealA test criterion is useful if for any T
that satisfies this criterion, if no errors are found by running tests from T then the program/model is highly reliable
At present, no particularly good testing criteria exist
Or at least, none of the existing ones have been proved particularly good
Test data selection techniques
RandomInterface basedFault based
Error seeding (mutation testing)Fault constraints
Error basedDomain and computation based
Coverage basedControl flowData flow
Random testing
Based on a description of the test data, randomly select test cases
Provides a statistical model of the reliability of the system
If the system fails on one test case out of 100, expect it to perform correctly about 99% of the time
Confidence in prediction increases as the number of test cases increases
Practically, proved a reasonable testing strategy, especially if the results can be evaluated automatically
Alternative testing techniques should be compared to random testing
Black box vs. white box testing
Black box testingTest case selection does not take the
structure of the system into accountUsually test cases are selected based on
the types of inputsWhite box testing
Test case selection is done by analyzing the structure/composition of the system
Equivalence partitioning (a black box approach)
Object of testing
All possible inputs
Representativeinputs
Outputs
How is domain partitioning done?
Based on the requirements for the system
E.g., if a system deals with controlling a pressure valve for a steam engine
Opens the valve if the pressure in the tank exceeds a certain threshold HIGH
Closes the valve if the pressure in the tank drops below a certain threshold LOW
All non-negative real numbers
p <= LOW LOW < p < HIGH p >= HIGH
White-box (program-based) Test Data Selectionstructural
coverage basedfault-based
e.g., mutation testing, RELAYerror-based
domain and computation based use representations created by
symbolic execution
Coverage Criteria
control-flow adequacy criteriaG = (N, E, S, T) where
the nodes N represent executable instructions (statement or statement fragment);
the edges E represent the potential transfer of control;
S is a designated start node;T is a designated final node E = { (ni, nj) | syntactically, the execution of
nj follows the execution of ni}
Control-Flow-Graph-Based Coverage Criteria Statement Coverage Path CoverageBranch CoverageHidden PathsLoop GuidelinesBoundary - Interior
Selecting paths that satisfy the criteria
static selectionsome of the associated paths may be
infeasibledynamic selection
monitors coverage and displays areas that have not been satisfactorily covered
Statement Coveragerequires that each statement in a
program be executed at least onceonly about 1/3 of NASA statements
were executed before software was released (Stucki 1973)
usually can achieve 85% coverage easily, but why not 100%?
unreachable codedead codecomplex sequence (should be tested!)
Microsoft reports 80-90% code coverage
Coincidental Correctness
Executing a statement does not guarantee that a fault on that path will be revealed
Y : = X + 2Y : = X * * 2
If x = 2 then the
fault is not exposed
Branch Coverage
Requires that each branch in a program (each edge in a control flow graph) be executed at least once
e.g., Each predicate must evaluate to each of its possible outcomes
Branch coverage is stronger than statement coverage
Hidden Path (branch) CoverageRequires that each condition in a
compound predicate be testedExample:
( X > 1 ) ( Y < 2 )
Test Data:
X = 2, Y = 5 ->T
X = 1, Y = 5 ->Fbut, true branch is never tested for data where Y < 2.
( X > 1 ) ( Y < 2 )T FF TT TF F
X > 1
Y < 2
T
F
F
T T
Loop Coverage
Path 1, 2, 1, 2, 3 executes all branches (and all statements) but does not execute the loop well.
1
2
3
Path CoverageRequires that every executable path in
the program be executed at least onceIn most programs, path coverage is
impossibleExample:
read N;SUM := 0;for I = 1 to N do
read X;SUM := SUM + X;
endforHow do we choose a set of paths?
Typical Guidelines for loop coverage
fall through case minimum number of iterations minimum +1 number of iterations maximum number of iterations maximum -1 number of iterations
1
2
3
1, 2, 3
1, 2, 1, 2, 3
1, 2, 1, 2, 1, 2, 3. . .
Boundary - Interior Criteria
boundary test of a loop causes the loop to be entered but not iterated
interior test of a loop causes a loop to be entered and then iterated at least once
both boundary and interior tests are to be selected for each unique path through the the loop
2
1
43
5 6
7
8
Paths for Example
Boundary paths1,2,3,5,7 a1,2,3,6,7 b1,2,4,5,7 c1,2,4,6,7 d
Interior paths a,b a,c a,d b,a b,c b,d c,a x,x for x = a, b, c, d
Validating Object Oriented Systems
Do OO systems make validation harder or easier?
Does code reuse lead to validation reuse?Do we need to change existing
techniques?Do we need to develop new
techniques?
Issues in O-O testing
basic unit for unit testingimplications of encapsulationimplications of inheritanceimplications of genericityimplications of polymorphism/dynamic
bindingimplications for integration testing
Driver(s)
Unit
Stub(s)
OracleUnit testing
test scaffoldingcan be created for general or for specific
testsis composed of
one or more drivers•provide a prototype activation environment•drivers initialize non-local variables and
parameters and call the unit
one or more stubs•provide a prototype of the units used by the
program to be tested
one or more oracles• identify the tests that cause failures
Unit Testing Object-Oriented systemsprocedural programming
basic component: subroutinetesting method: subroutine input/ output based
object-oriented programmingbasic component: class = data structure + set
of operationsobjects are instances of classesdata structure defines the state of the object,
thus correctness is not based only on output, but also on the state
data structure is not directly accessible, but can only be accessed using the access methods (encapsulation)
Example
class Watcher{private:... int Current _Value; int Last_Value; int Status; void check_ pressure(); void alarm( int);public: void start();}
class Watcher{private:... int Current _Value; int Last_Value; int Status; void check_ pressure(); void alarm( int);public: void start();}
void check_ pressure {...Last_Value = Current_ Value;Current_ Value = reactor. temperature;if Current_ Value > NORMAL if Current_ Value - Last_ Value > 20 if Status == 2 alarm( 3); // start critical situation alarm else{ Status = 2; // set status to warning level alarm( 2); // send warning signal } else... // other possible situations}
void check_ pressure {...Last_Value = Current_ Value;Current_ Value = reactor. temperature;if Current_ Value > NORMAL if Current_ Value - Last_ Value > 20 if Status == 2 alarm( 3); // start critical situation alarm else{ Status = 2; // set status to warning level alarm( 2); // send warning signal } else... // other possible situations}
value produced by method check_pressure depends on class Watcher(variable Last_Value)
failures due to incorrect values of Last_Value can be revealed only by tests that have control on that variable
example by Mauro Pezze & Michael Young ©1998
Basic Unit for Testing
the class is the natural unit for unit test case design
methods are meaningless apart from their class
testing a class instance (an object) can verify a class in isolation
when individually verified classes are used to create more complex classes in an application system, the entire subsystem must be tested as a whole before it can be considered to be verified (integration testing)
Encapsulationnot a source of errors but may be an
obstacle to testinghow to get at the concrete state of an
object?break the encapsulation
using features of the languages•C++ friend•Ada95 Child Unit
use low level probes or debug tools to manually inspect
How to get at the concrete state of an object? Use the abstraction
Scenarios--examine sequences of eventsState is implicitly inspected via access
methodsUse or provide hidden functions to
examine the stateUseful for debugging throughout the life of
the systemBut modified code, may alter the
behaviorEspecially true for languages that do not
support strong typing
Implications of Inheritance
inherited features often require re-testing because a new context of usage results when
features are inherited
multiple inheritance increases the number of contexts to test
specialization relationshipsimplementation specialization should correspond
to problem domain specializationreusability of superclass test cases depends on
this
Parent class contains:foo(int x)range() - returns a number in range 1 to 10 inclusive
Parent class contains:foo(int x)range() - returns a number in range 1 to 10 inclusive
Child class contains:range() - is redefined to returns a number in the range 0 to20 inclusive//foo() is inherited
Child class contains:range() - is redefined to returns a number in the range 0 to20 inclusive//foo() is inherited
foo contains the code:if (x<0) x = x/redefined()return x
foo contains the code:if (x<0) x = x/redefined()return x
have to testwhen x<0, coulddivide by 0
Which functions must be tested in a subclass?
Example
child::range has to be tested afreshdoes child::foo() have to be retested?
child::foo() may not have to be completely testedif code in foo() doesn’t depend on range(); doesn’t
call it nor call any code that indirectly calls it
Can tests for a parent class be reused for a child class? parent::range() and child::range() are two
different functions with different specifications and implementations
tests are derived from the different specification and implementation
but the functions are likely to be similar, so the better the OO design, the greater the overlap
new tests are those for child::range requirements that are not satisfied by the parent::range tests
the simpler a test, the more likely it is to be reusable in subclasses
but simple tests tend to find only the simple faults
Example
Parent::describedSelf() is this codeif (val < 0) message(“Less”)else if(val==0) message(“Equal”)else message(“More”)
Parent::describedSelf() is this codeif (val < 0) message(“Less”)else if(val==0) message(“Equal”)else message(“More”)
Child::describedSelf() is this codeif (val < 0) message(“Less”)else if(val==0) message(“Zero Equal”)else{ message(“More”)
if(val==42) message(“Jackpot”)}
Child::describedSelf() is this codeif (val < 0) message(“Less”)else if(val==0) message(“Zero Equal”)else{ message(“More”)
if(val==42) message(“Jackpot”)}
Tests: input, expected output -1 Less 0 Equal 1 More
Tests: input, expected output -1 Less 0 Equal 1 More
OKChange ------ Zero EqualOKAdd 42 Jackpot
Approaches to Inheritance Testingflattening inheritance
each subclass is tested as if all inherited features were newly defined
tests used in the super-classes can be reusedmany tests are redundant
incremental testingreduce tests only to new/modified featuresdetermining what needs to be tested requires
automated support
Polymorphism
in procedural programming, procedure calls are statically bound
each possible binding of a polymorphic component requires a separate set of test cases
may be hard to find all such bindingsmay also complicate integration
planning
Example
void resize(Shape polygon){...data = polygon.area();...}
void resize(Shape polygon){...data = polygon.area();...}
squaretriangle
Shape
pentagon
...
Which implementation of area is actually called?
Approaches to the Dynamic Binding Problemreduce combinatorial explosion in the
number of test cases that cover all possible combinations of polymorphic calls
Use static analysis (data flow analysis) to determine possible bindings
In most systems, the average number of “possible” bindings is 2
White-box vs. Black-box Testing of O-O
object-oriented specification described functional behavior
implementation describes how that is achieved
UniqueTable exampleWhite box testing creates test cases that
focus on how the table is implemented“Jackpot” in previous example shows
need for white-box testing
White box O-O Testing
these techniques can be adapted to method testing, but are not sufficient for class testing
conventional flow-graph approachesWhat about flow between methods?Do methods in a class have a special
relationship that deserves special consideration or are interprocedural techniques adequate?
Black-box O-O Testing
conventional black-box methods are useful for object-oriented systems
Additional proposalsutilize
specification integrated with the implementation as dynamic assertions
C++ assertions or Eiffel pre/post-conditions offer similar self-checking
State-based Testing
derives test cases by modeling a class as a state machine
methods result in state transitionsstate model defines allowable
transition sequencese.g., an instance must be created before it
can be updated or deletedtest cases are devised to exercise each
transition
Problems with state-based O-O testing
may take a lengthy sequence of operations object in a desired state
may not be productive if a class is designed to accept any possible sequence of method activation
state control may be distributed over an entire application
System-wide control makes it difficult to verify a class in isolation
a global state model is needed to show how classes interact
ASTOOT
Proposed by Phyllis Frankl and R.K. Doong
Requires each class to provide its own simplified “oracle”
Determines if two instances of a class are equivalent
Uses a class’ algebraic specification to derive alternative equivalent test cases
A form of specification-based testingUses the oracle to determine if the
implementation of the class satisfies the specification of the class for the test cases
Simplified oracle
Requires that each class have an equivalence function, EQN, that determines if two instances of the same class are observationally equivalent
E.g. EQN( create.push(5).push(6).pop, create.push(5)) would return true
Can define EQN recursively using the access methods
Can define EQN using the underlying implementation
Example: recursive definition of EQN
if isempty(s1) and isempty (s2) then true elseif isempty(s1) then false
elseif isempty(s2) then falseelseif top(s1)≠top(s2) then falseelse
EQN (pop(s1),pop(s2))endif
Example:implementation based definition of EQN
EQN(s1, s2) returns flagflag := true;If size(s1) != size(s2) then flag := false;while 0 <= I <= size(s1) and flag =true do if s1(I) != s2(I) then flag := falseendwhile;return flag;
Create pairs of equivalent test cases
Use implicit (algebraic) specifications or variants of this approach to define test cases
Create test cases that are syntactically correct sequences of access methods
Can be either user defined or automatically generated from the algebraic specification
Using algebraic specifications, simplify or extend sequences to create “equivalent” test cases
E.g.,create.push(5)=create.push(5).top=create.push(5).top.push(n).pop(n) = ...