intro to software engineering - software quality assurance

Software quality assurance

McGill ECSE 321Intro to Software Engineering

Radu Negulescu

Fall 2003

McGill University ECSE 321 © 2003 Radu Negulescu Introduction to Software Engineering Software quality assurance—Slide 2

About this module

There is a difference between software that just happens to work and software that is known to work with a high degree of confidence

Here we discuss

• Concepts of software quality assurance

• Activities of software quality assurance

• Reliability and release decision

We do not discuss testing

• Next module – a whole topic by itself


Some terms to master

Bugs and defects

• Bug: Mark I malfunction caused by a moth

• Failure: deviation from specified behavior

• Defect (fault): cause of a failure

Some sources also distinguish

• Error: system state that leads to a failure [BD]

• Fault vs. defect: before or after release [Pressman]

Graphical impression of defects, errors, and failures [after BD]Defect (fault) Error Failure


SQA techniques

Main types of SQA

• VerificationMeaning: the program conforms to specification

“Are we building the product right?”Examples: testing, formal verification

• ValidationMeaning: the specified system is what the customer wants built

“Are we building the right product?”Examples: prototyping, derivation of user acceptance tests

• Fault preventionMeaning: decrease the chance of occurrence of faultsExamples: using a modular structure, documenting assertions, etc.


SQA techniques

Debugging

• Fault search, location, repair

Testing

• Unit, integration, system, …

• Alpha, beta, …

• Functional, performance, usability, …

Manual checks

• Reviews, inspections, walkthroughs, …

Modeling and prototyping

Reliability measurement

Formal methods

Defect management

…


Relative effectiveness

Low Med HighPersonal design checking 15% 35% 70%Design reviews 30% 40% 60%Design inspections 35% 55% 75%Code inspections 30% 60% 70%Prototyping 35% 65% 80%Unit testing 10% 25% 50%Group-test related routines 20% 35% 55%System testing 25% 45% 60%Field testing 35% 50% 65%Cumulative 93% 99% 99%[Programming Productivity - Jones 1986]


Relative effectiveness

Observations

• Individually, none of these techniques has a definite statistical advantage

They tend to discover different types of faultsTesting: extreme cases and human oversightsReviews: common errors

Therefore, a combination of techniques is most effective

• Emphasis on upstream SQARedistribute resources from debugging into SQA activities in the early stages of a software project

• Not included in the studyDebugging: triggered by non-systematic, unpredictable fault search

Yet, cannot be completely avoidedFormal methods: limited to small subsystems or high-level interfaces

Useful in some niche applications


Debugging

Finding faults from an unplanned failure

• Correctness debugging: determine and repair deviations from specified functional requirements

• Performance debugging: address deviation from non-functionalrequirements

Debugging requires skill

• 20:1 difference in effectiveness between best and worst


Debugging activities

Fault search

• Unpredictable, costly

• Should be replaced by other techniques wherever possible

Fault location

• Can and should be done in a systematic manner

• Use tool assistance

Fault repair

• May introduce new faults

Discuss in the following


Debugging pitfalls

Don’t do this!

• Locate faults by guessing without a rational basis for the guess“Superstition debugging”Do not confuse with “educated guess”

• Fix the symptom without locating the bugBranching on the “problem input” creates numerous problems by itself

• Become depressed if you can’t find the bugThis can be avoided by staying in control with systematic techniques

Programmer statistics: 20:1 differences in effectiveness at debugging!


Debugging pitfalls

Is it a horse?

No!

Is it a chair?

No!

Is it a pencil?

No!

...

How to lose the “20 questions” game


Locating a fault

Steps in locating a fault

• Stabilize the failureDetermine symptom: observed output =/= expected outputDetermine inputs on which the failure occurs predictably

• Simplify the failureExperiment with simpler dataSee if the failure still happens

• Progressively reduce the scope of the faultSome form of binary search works bestWeighted binary trees

• The “scientific method” works for all of the aboveThis is how science is produced since ancient daysElaborate “design of experiment” techniques in manufacturing QA


“Scientific method”

Steps in the scientific method

• Examine data that reveals a phenomenon

• Form a hypothesis to explain the data

• Design an experiment that can confirm or disprove the hypothesis

• Perform the experiment and either adopt or discard the hypothesis

• Repeat until a satisfactory hypothesis is found and adopted

Example

• Hypothesis: the memory access violation occurs in module A

• Experiment: run with a breakpoint at the start of module AOr, insert a print statement at the start of A

Example

• Hypothesis: the fault was introduced by Joe

• Experiment: compare Joe’s source code to previous versionE.g. by running diff under UNIX


Locating a fault

Example

• IntBag: contains unordered integers, some of which may be equalE.g. {12, 5, 9, 9, 9, -4, 100}

• Suppose that the following failure occurs for an IntBag object: Methods invoked (“input”):

insert(5); insert(10); insert(10); insert(10); extract(10); extract(10);total()

Failure symptom:expected return value for total() = 15; observed value = 5

• Debugging strategyWhat would be an effective way to locate the fault?


“Scientific method” in practice

Use scaffolding to reproduce the error on separate modules

• Scope narrowing: isolate from rest of system

Other examples?


Using debugging tools

Experiment with debugger features:

• Control: step into, step over, continue, run to cursor, set variable, ...

• Observation: breakpoints, watches (expression displays)

• Advanced: stack, memory leaks, ...

Combine debugging with your own reasoning about correctness

• ExampleInfer that i should be ==n after “for (i = 2; i < n; i ++) {…}”Although some side effects may overwrite i

Step through the code with a debugger

• Watches on

• Assertions enabled


Repairing faults

Make sure you understand the problem before fixing it

• As opposed to patching up the program to avoid the symptom

• Fix the problem, not the symptom

Always perform regression tests after the fix

• I.e., use debugging in combination with systematic testing

Always look for similar faults

• E.g., by including the fault type on a review checklist


Miscellaneous debugging tips

Avoid debugging as much as you can!

• Enlightened procrastination

• When you have to debug, debug less and reason more

Talk to others about the failure

See debugging as opportunity

• Learn about the program

• Learn about likely kinds of mistakes

• Learn about how to fix errors

Never debug standing up!


Manual checks

Manual examination of any software engineering artifacts

• Code

• DD, SRS, TP, ...

Focused on the artifact, not on the author

Different levels of formality:

• Inspections, reviews, walkthroughs, code reads, or simply explaining the problem to a colleague

• Terminology varies a lot (e.g. McConnell uses term “reviews” generically)

• Typically involve pre-release of the artifact to the reviewers, and a moderated meeting to discuss the results of the reviews

Effective at detecting faults early

• NASA SEL study: 3.3 defects per hour of effort for code reads, compared to testing 1.8 defects per hour of effort


Checklists

Keep reviews focused, uniform, and manageable

• Based on similar systems, past experience

• Items stated objectively

Example [Sommerville]

Data faults:Are all program variables initialized before use?Have all constants been named?Should the upper bound of the array be equal to size or size – 1?If character strings are used is a delimiter assigned?Is there any possibility of buffer overflow?


Purpose of manual checks

Multiple purposes

• Find different defects than other SQA techniquesExamine artifacts that are beyond the scope of other SQA techniquesBased on the idea that different people have different “blind spots”

• Disseminate corporate culture“The way we do things around here”

• Measure quality, monitor progress“If you want something to happen, measure it” – Andrew Grove

• Due to subjectivity and incompleteness, should NOT be used to evaluate the author’s performance

Nevertheless, reviews do encourage quality work, indirectly


Manual check processes

Manual check processes

• Roles: moderator, author, reviewer, scribeWalkthroughs: the author can be the moderatorReview: the author can present the itemInspections: the artifact should speak for itself

Marginally more effective, but require more practice

• Preparation: individualThe artifact is released in advance to each reviewerReviewers look for defects

• Meeting: moderated to proceed at optimal speedDon’t discuss solutions

The purpose is fault detection, not correctionNever be judgmental

“Everyone knows it’s more efficient to loop from n to 0”

• Record: type and severity of errors; cost statistics

• Informal meetings off-line

• The author decides what to do about each defect


Fagan’s inspections

Steps

• Overview

• Preparation

• Meeting

• Rework

• Follow-up

Objective: finding errors, deviations, inefficiencies

• But not fixing any of these

Problem: lengthy preparation and inspection meeting phases


Parnas’ “active design review”

Questionnaire testing the understanding of each reviewer

No general meeting

• Individual meetings between author and each reviewer


Prototyping

Simplified version of the system for evaluation with a user or manager

• Evolutionary vs. throw-away prototypes

• Horizontal vs. vertical prototypes


Evolutionary prototyping

Process

• Develop initial implementation

• Expose implementation to user comments

• Enhance the implementation

• Repeat (comment-enhance) until a satisfactory system is obtained

Address so-called “wicked problems”

• Where requirements are discovered only as the system is developed

• E.g. chess-playing program

Downsides

• Absence of quantifiable deliverables

• Maintenance problems


Throw-away prototyping

Extend the requirements analysis with the production of a prototype

• For evaluation purposes only (will not be incorporated in the system)

• ExamplesUI-onlyCalibrated stubs

Benefits

• Clarify requirements

• Reduce process risksTechnical risks (performance, feasibility, …)Suitability risks (functionality, ease of use, …)

Downsides

• Can be misleading as it usually leaves out many features

• Cannot be part of the “contract”

• Cannot really capture reliability requirements


Prototype tests

Horizontal prototype: UI

• Validate the requirements

Vertical prototype: a complete use case

• Think horizontallyAbstraction

• Do verticallyUse caseFunctional requirementProject risk

Example: embedded server

Application layer

OS layer

Communications

Device I/O

Acq

uire rem

ote data

Disp

lay remote d

ata

Closed

loop co

ntrol

…

Performan

ce tunin

g


Software reliability

Probability of failure-free operation

• MTTF = mean time to failure (aka. MTBF)

• Failure intensity (ROCOF) = number of failures per time unit = 1/MTTF (if the system is not changed)

• Probability of availability (AVAIL) = MTTF / (MTTF + repair time +other downtime)

Reliability depends on the operational profile and number of defects

• ROCOF(t) = Σfeature x (probability of using x at time t) * ROCOFx(t)

• ROCOFx(t) = (failure intensity per defect) * (# of defects in x)


Software reliability

Steps in determining software reliability

• Determine operational profile (probable pattern of usage)Or, collect operational dataReusable; e.g. phone connection data

• Select a set of test data that matches the operational profileNumber of test cases in a given class should be proportional to the likelihood of inputs in that class

• Apply the test cases to the programAccelerated testing: virtual time vs. use time (raw time, real time)Record time to failure

• Compute reliability on a statistically significant sample


Release decision

When should we turn off testing?

• Never

• When we run out of resources (time & budget)

• When we have achieved complete coverage of our test criteria

• When we hit targeted reliability estimate

Statistics on defects left in code

• Industry average: 15..50 defects/KLOC (including code produced using bad development practices)

• Best practices: 1..5 defects/KLOCIt is cheaper to build high-quality software than to fix low-quality software

• Reduced rates (0.1..0.5 defects/KLOC) for combinations of QA techniques and for “cleanroom process”

Justified in special applicationsExtra effort is necessary


Reliability growth models

Predict how software reliability should improve over time as faults are discovered and repaired


• Equal steps: reliability grows by sudden jumps, by a constant amount after fixing each fault

• Normally distributed steps: non-constant jumpNegative steps: the reliability might actually decrease after fixing a fault

• Continuous models: focus on time as opposed to discrete stepsRecognize that it is increasingly difficult to find new faultsCalibration required for type of applicationTarget reliability

• No universally applicable modelHighly dependent on type of application, programming language, development process, testing/QA process



Exponential model

• Fault detection probabilityProbability density of finding a fault at time t: f(t) = f0 e-t f0

Fault detection rate (FDR) = f(t) * (initial # of defects)Cumulative distribution: F(t) = 1-e-t f0

• Sanity checksSimple assumptions / first approximationF(0) = 0; F(infinity) = 1f(0) = f0; f(infinity) = 0



Consequences of the exponential model

• Total (initial) number of faults = N = 1/F(t) * (# faults found)

• Remaining faults = N * (1 – F(t)) = N e-t f0

Rate of finding faults at time t = N * f(t) = f0 * (remaining faults)

• Time (effort) for finding a fault = 1/(N * f(t)) = (1/N) * (et f0/f0) Inversely proportional to the number of remaining faultsExponential in t

Compare to a linear (basic) modelf(t) = f0F(t) = 1 - f0 * t up to time t; = 0 after time tTime for finding a fault = 1/(N * f0); constant

To probe further: Musa, Ackerman. “Quantifying Software Validation: When to Stop Testing?”. IEEE Software, May 1989

• More detailed models (log Poisson distribution)



Identifying the parameters

• Interpolation (curve fitting)

• Fault prediction based on past data

QA week # (normalized)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Faults found(log scale)

Interpolated

10

Predicted

Target rate

100

1000

1


References

SQA basics

• BD 9.1-9.2

Reviews and inspections:

• BD 3.3.4, 3.4.4, 9.2.2 (p. 333), 9.4.1

• Sommerville 19.2

• McConnell ch. 23

Debugging

• McConnell ch. 26

Reliability

• Sommerville 21.2

Prototyping

• Sommerville ch. 8

• BD p. 124

intro to software engineering - software quality assurance

Technology