testing: not just error detection helps to evaluate actual performance of the system need to test...

58
Testing: not just error detection Helps to evaluate actual performance of the system Need to test specifications and assumptions about the environment Need to validate the performance Compare actual performance to the worst- case analysis Compare actual performance to the expected performance Cannot always test the system! E.g. recovery actions for a power plan failure Simulation techniques try to imitate the environment in such cases There are limits to what simulations can do

Upload: colin-hunt

Post on 28-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Testing: not just error detection

Helps to evaluate actual performance of the system

Need to test specifications and assumptions about the environment

Need to validate the performanceCompare actual performance to the

worst-case analysisCompare actual performance to the

expected performanceCannot always test the system!

E.g. recovery actions for a power plan failure

Simulation techniques try to imitate the environment in such cases

There are limits to what simulations can do

When should testing start?

When the system is fully implemented?NO!

Many potential problems that designers and coders were aware of during development are forgotten

The testing task is just too formidable at this point

The first rule of testing (D. Hamlet, J. Maybee)

Get out the test plan and follow itMany aspects that need to be tested

arise at requirements and design stages•They need to be documented•Testing of these aspects should be thought through

What should testing accomplish?

Software testing has to verify that software meets its requirements

Too naïve!Cannot verify!

E.g. a requirement can say “for all inputs, the program will…”

There may be infinitely many possible inputs

Realistically, software testing has to find failures

Important terminology (IEEE standards)

FailureThe software does something contrary to

its specificationFault

Something in the program code from which a failure can arise

What about bug and defect?They usually refer to faults

You observe failures and find faults

How does testing work?

It’s not a simple question. The answer depends on many parameters

The nature of the module being testedE.g. scientific computations vs. UI components

What type of validation we are looking forFunctionality

• Behaves correctly• Looks correctly

PerformanceInteroperability

The way that testing is performedViewing the module being tested as a black or

white box The goals of testing

Looking for bugsProving to the users that the system “works”

How does testing work? (cont.)

Object of testing

Run-time environment

Inputs

Outputs

Observations

Test specification Oracle

Outcome of testing

Types of testing, depending on purpose of testingUnit (or module) testing

Individual components are tested independently

E.g. class testingE.g. module testing

Integration testingTesting interfaces between several

subsystemsSystem testing

Testing the complete systemAcceptance testing

Testing with the data supplied by the usersMay be done in the presence of users

Regression testingTesting an incremental version of the system

How do we select test cases

By the number of test casesE.g. N random test cases

Based on some properties of the system

By their ability to detect faults

Test selection criteria

There is an abstract domain D of all possible test cases (all possible inputs to the program or module)

Let T be a subset of DA test selection criterion is a predicate

that specifies whether test set T is in some sense “enough” to test the program or module

Two uses for testing criteria:Stopping rule - know when the system has

been tested enoughTest data evaluation rule - evaluates

quality of selected test casesSeveral testing criteria may be used at

the same time

Ideal test selection criterion

A test selection criterion is ideal if for any test set T that satisfies this criterion, T detects all errors, if any, in the program/module

Of course, it is desirable that T is manageable in size, so that testing does not take forever

In general, only the test criterion that requires running all tests in D is ideal

A test selection criterion can be useful even if not idealA test criterion is useful if for any T

that satisfies this criterion, if no errors are found by running tests from T then the program/model is highly reliable

At present, no particularly good testing criteria exist

Or at least, none of the existing ones have been proved particularly good

Test data selection techniques

RandomInterface basedFault based

Error seeding (mutation testing)Fault constraints

Error basedDomain and computation based

Coverage basedControl flowData flow

Random testing

Based on a description of the test data, randomly select test cases

Provides a statistical model of the reliability of the system

If the system fails on one test case out of 100, expect it to perform correctly about 99% of the time

Confidence in prediction increases as the number of test cases increases

Practically, proved a reasonable testing strategy, especially if the results can be evaluated automatically

Alternative testing techniques should be compared to random testing

Black box vs. white box testing

Black box testingTest case selection does not take the

structure of the system into accountUsually test cases are selected based on

the types of inputsWhite box testing

Test case selection is done by analyzing the structure/composition of the system

Equivalence partitioning (a black box approach)

Object of testing

All possible inputs

Representativeinputs

Outputs

How is domain partitioning done?

Based on the requirements for the system

E.g., if a system deals with controlling a pressure valve for a steam engine

Opens the valve if the pressure in the tank exceeds a certain threshold HIGH

Closes the valve if the pressure in the tank drops below a certain threshold LOW

All non-negative real numbers

p <= LOW LOW < p < HIGH p >= HIGH

White-box (program-based) Test Data Selectionstructural

coverage basedfault-based

e.g., mutation testing, RELAYerror-based

domain and computation based use representations created by

symbolic execution

Coverage Criteria

control-flow adequacy criteriaG = (N, E, S, T) where

the nodes N represent executable instructions (statement or statement fragment);

the edges E represent the potential transfer of control;

S is a designated start node;T is a designated final node E = { (ni, nj) | syntactically, the execution of

nj follows the execution of ni}

Control-Flow-Graph-Based Coverage Criteria Statement Coverage Path CoverageBranch CoverageHidden PathsLoop GuidelinesBoundary - Interior

Selecting paths that satisfy the criteria

static selectionsome of the associated paths may be

infeasibledynamic selection

monitors coverage and displays areas that have not been satisfactorily covered

Statement Coveragerequires that each statement in a

program be executed at least onceonly about 1/3 of NASA statements

were executed before software was released (Stucki 1973)

usually can achieve 85% coverage easily, but why not 100%?

unreachable codedead codecomplex sequence (should be tested!)

Microsoft reports 80-90% code coverage

Coincidental Correctness

Executing a statement does not guarantee that a fault on that path will be revealed

Y : = X + 2Y : = X * * 2

If x = 2 then the

fault is not exposed

Branch Coverage

Requires that each branch in a program (each edge in a control flow graph) be executed at least once

e.g., Each predicate must evaluate to each of its possible outcomes

Branch coverage is stronger than statement coverage

Branch Coverage

3

1

2

STATEMENT COVERAGE: PATH 1, 2, 3

BRANCH COVERAGE: PATH 1, 2, 1, 2, 3

Hidden Path (branch) CoverageRequires that each condition in a

compound predicate be testedExample:

( X > 1 ) ( Y < 2 )

Test Data:

X = 2, Y = 5 ->T

X = 1, Y = 5 ->Fbut, true branch is never tested for data where Y < 2.

( X > 1 ) ( Y < 2 )T FF TT TF F

X > 1

Y < 2

T

F

F

T T

Loop Coverage

Path 1, 2, 1, 2, 3 executes all branches (and all statements) but does not execute the loop well.

1

2

3

Path CoverageRequires that every executable path in

the program be executed at least onceIn most programs, path coverage is

impossibleExample:

read N;SUM := 0;for I = 1 to N do

read X;SUM := SUM + X;

endforHow do we choose a set of paths?

Typical Guidelines for loop coverage

fall through case minimum number of iterations minimum +1 number of iterations maximum number of iterations maximum -1 number of iterations

1

2

3

1, 2, 3

1, 2, 1, 2, 3

1, 2, 1, 2, 1, 2, 3. . .

Boundary - Interior Criteria

boundary test of a loop causes the loop to be entered but not iterated

interior test of a loop causes a loop to be entered and then iterated at least once

both boundary and interior tests are to be selected for each unique path through the the loop

Example

2

1

43

5 6

7

8

2

1

43

5 6

7

8

Paths for Example

Boundary paths1,2,3,5,7 a1,2,3,6,7 b1,2,4,5,7 c1,2,4,6,7 d

Interior paths a,b a,c a,d b,a b,c b,d c,a x,x for x = a, b, c, d

Validating Object Oriented Systems

Do OO systems make validation harder or easier?

Does code reuse lead to validation reuse?Do we need to change existing

techniques?Do we need to develop new

techniques?

Issues in O-O testing

basic unit for unit testingimplications of encapsulationimplications of inheritanceimplications of genericityimplications of polymorphism/dynamic

bindingimplications for integration testing

Driver(s)

Unit

Stub(s)

OracleUnit testing

test scaffoldingcan be created for general or for specific

testsis composed of

one or more drivers•provide a prototype activation environment•drivers initialize non-local variables and

parameters and call the unit

one or more stubs•provide a prototype of the units used by the

program to be tested

one or more oracles• identify the tests that cause failures

Unit Testing Object-Oriented systemsprocedural programming

basic component: subroutinetesting method: subroutine input/ output based

object-oriented programmingbasic component: class = data structure + set

of operationsobjects are instances of classesdata structure defines the state of the object,

thus correctness is not based only on output, but also on the state

data structure is not directly accessible, but can only be accessed using the access methods (encapsulation)

Example

class Watcher{private:... int Current _Value; int Last_Value; int Status; void check_ pressure(); void alarm( int);public: void start();}

class Watcher{private:... int Current _Value; int Last_Value; int Status; void check_ pressure(); void alarm( int);public: void start();}

void check_ pressure {...Last_Value = Current_ Value;Current_ Value = reactor. temperature;if Current_ Value > NORMAL if Current_ Value - Last_ Value > 20 if Status == 2 alarm( 3); // start critical situation alarm else{ Status = 2; // set status to warning level alarm( 2); // send warning signal } else... // other possible situations}

void check_ pressure {...Last_Value = Current_ Value;Current_ Value = reactor. temperature;if Current_ Value > NORMAL if Current_ Value - Last_ Value > 20 if Status == 2 alarm( 3); // start critical situation alarm else{ Status = 2; // set status to warning level alarm( 2); // send warning signal } else... // other possible situations}

value produced by method check_pressure depends on class Watcher(variable Last_Value)

failures due to incorrect values of Last_Value can be revealed only by tests that have control on that variable

example by Mauro Pezze & Michael Young ©1998

Basic Unit for Testing

the class is the natural unit for unit test case design

methods are meaningless apart from their class

testing a class instance (an object) can verify a class in isolation

when individually verified classes are used to create more complex classes in an application system, the entire subsystem must be tested as a whole before it can be considered to be verified (integration testing)

Encapsulationnot a source of errors but may be an

obstacle to testinghow to get at the concrete state of an

object?break the encapsulation

using features of the languages•C++ friend•Ada95 Child Unit

use low level probes or debug tools to manually inspect

How to get at the concrete state of an object? Use the abstraction

Scenarios--examine sequences of eventsState is implicitly inspected via access

methodsUse or provide hidden functions to

examine the stateUseful for debugging throughout the life of

the systemBut modified code, may alter the

behaviorEspecially true for languages that do not

support strong typing

Implications of Inheritance

inherited features often require re-testing because a new context of usage results when

features are inherited

multiple inheritance increases the number of contexts to test

specialization relationshipsimplementation specialization should correspond

to problem domain specializationreusability of superclass test cases depends on

this

Parent class contains:foo(int x)range() - returns a number in range 1 to 10 inclusive

Parent class contains:foo(int x)range() - returns a number in range 1 to 10 inclusive

Child class contains:range() - is redefined to returns a number in the range 0 to20 inclusive//foo() is inherited

Child class contains:range() - is redefined to returns a number in the range 0 to20 inclusive//foo() is inherited

foo contains the code:if (x<0) x = x/redefined()return x

foo contains the code:if (x<0) x = x/redefined()return x

have to testwhen x<0, coulddivide by 0

Which functions must be tested in a subclass?

Example

child::range has to be tested afreshdoes child::foo() have to be retested?

child::foo() may not have to be completely testedif code in foo() doesn’t depend on range(); doesn’t

call it nor call any code that indirectly calls it

Can tests for a parent class be reused for a child class? parent::range() and child::range() are two

different functions with different specifications and implementations

tests are derived from the different specification and implementation

but the functions are likely to be similar, so the better the OO design, the greater the overlap

new tests are those for child::range requirements that are not satisfied by the parent::range tests

the simpler a test, the more likely it is to be reusable in subclasses

but simple tests tend to find only the simple faults

Example

Parent::describedSelf() is this codeif (val < 0) message(“Less”)else if(val==0) message(“Equal”)else message(“More”)

Parent::describedSelf() is this codeif (val < 0) message(“Less”)else if(val==0) message(“Equal”)else message(“More”)

Child::describedSelf() is this codeif (val < 0) message(“Less”)else if(val==0) message(“Zero Equal”)else{ message(“More”)

if(val==42) message(“Jackpot”)}

Child::describedSelf() is this codeif (val < 0) message(“Less”)else if(val==0) message(“Zero Equal”)else{ message(“More”)

if(val==42) message(“Jackpot”)}

Tests: input, expected output -1 Less 0 Equal 1 More

Tests: input, expected output -1 Less 0 Equal 1 More

OKChange ------ Zero EqualOKAdd 42 Jackpot

Approaches to Inheritance Testingflattening inheritance

each subclass is tested as if all inherited features were newly defined

tests used in the super-classes can be reusedmany tests are redundant

incremental testingreduce tests only to new/modified featuresdetermining what needs to be tested requires

automated support

Polymorphism

in procedural programming, procedure calls are statically bound

each possible binding of a polymorphic component requires a separate set of test cases

may be hard to find all such bindingsmay also complicate integration

planning

Example

void resize(Shape polygon){...data = polygon.area();...}

void resize(Shape polygon){...data = polygon.area();...}

squaretriangle

Shape

pentagon

...

Which implementation of area is actually called?

Approaches to the Dynamic Binding Problemreduce combinatorial explosion in the

number of test cases that cover all possible combinations of polymorphic calls

Use static analysis (data flow analysis) to determine possible bindings

In most systems, the average number of “possible” bindings is 2

White-box vs. Black-box Testing of O-O

object-oriented specification described functional behavior

implementation describes how that is achieved

UniqueTable exampleWhite box testing creates test cases that

focus on how the table is implemented“Jackpot” in previous example shows

need for white-box testing

White box O-O Testing

these techniques can be adapted to method testing, but are not sufficient for class testing

conventional flow-graph approachesWhat about flow between methods?Do methods in a class have a special

relationship that deserves special consideration or are interprocedural techniques adequate?

Black-box O-O Testing

conventional black-box methods are useful for object-oriented systems

Additional proposalsutilize

specification integrated with the implementation as dynamic assertions

C++ assertions or Eiffel pre/post-conditions offer similar self-checking

State-based Testing

derives test cases by modeling a class as a state machine

methods result in state transitionsstate model defines allowable

transition sequencese.g., an instance must be created before it

can be updated or deletedtest cases are devised to exercise each

transition

Example: State model of a stack

create

top, pop

push

Not isempty

push, top, popisempty

isempty

push

Problems with state-based O-O testing

may take a lengthy sequence of operations object in a desired state

may not be productive if a class is designed to accept any possible sequence of method activation

state control may be distributed over an entire application

System-wide control makes it difficult to verify a class in isolation

a global state model is needed to show how classes interact

ASTOOT

Proposed by Phyllis Frankl and R.K. Doong

Requires each class to provide its own simplified “oracle”

Determines if two instances of a class are equivalent

Uses a class’ algebraic specification to derive alternative equivalent test cases

A form of specification-based testingUses the oracle to determine if the

implementation of the class satisfies the specification of the class for the test cases

Simplified oracle

Requires that each class have an equivalence function, EQN, that determines if two instances of the same class are observationally equivalent

E.g. EQN( create.push(5).push(6).pop, create.push(5)) would return true

Can define EQN recursively using the access methods

Can define EQN using the underlying implementation

Example: recursive definition of EQN

if isempty(s1) and isempty (s2) then true elseif isempty(s1) then false

elseif isempty(s2) then falseelseif top(s1)≠top(s2) then falseelse

EQN (pop(s1),pop(s2))endif

Example:implementation based definition of EQN

EQN(s1, s2) returns flagflag := true;If size(s1) != size(s2) then flag := false;while 0 <= I <= size(s1) and flag =true do if s1(I) != s2(I) then flag := falseendwhile;return flag;

Create pairs of equivalent test cases

Use implicit (algebraic) specifications or variants of this approach to define test cases

Create test cases that are syntactically correct sequences of access methods

Can be either user defined or automatically generated from the algebraic specification

Using algebraic specifications, simplify or extend sequences to create “equivalent” test cases

E.g.,create.push(5)=create.push(5).top=create.push(5).top.push(n).pop(n) = ...