msc software testing and maintenance msc prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 1

MSc Software Testing and MaintenanceMSc Prófun og viðhald hugbúnaðar

Fyrirlestrar 41 & 42Comparing Bug Finding Tools

Can you detect me?

http://findbugs.sourceforge.net/

http://pmd.sourceforge.net/

http://qjpro.sourceforge.net/


Case StudyDæmisaga

ReferenceComparing Bug Finding Tools with Reviews and Tests,Stefan Wagner, Jan Jürjens, Claudia Koller, and Peter Trischberger, Institut für Informatik, Technische Universität München, 2005http://www4.in.tum.de/publ/papers/SWJJCKPT05.pdf


1. Introduction

• Software quality assurance accounts for around 50% of the development time.

• Defect-detection techniques need to be improved and costs reduced.

• There are a number of automated static analysis tools called bug finding tools.

• Faults are the cause of failures in code.


Problem

1. Which kinds of defects are found by bug finding tools, reviews, and testing?

2. Are the same or different defects found?• How much overlap is there between the

different techniques?

3. Do the static analysis tools produce too many false positives?

• Bug reports that are not actually bugs...

1. Introduction


Results1. Bug finding tools detect only a subset of the

kinds of defects that reviews find.2. The tools are better regarding the bug patterns

they are programmed for.3. Testing finds completely different defects than

bug finding tools.4. Bug finding tools produce many more false

positives than true positives.5. Results of applying bug finding tools varies

according to the project being studied.

1. Introduction


Consequences

1. Testing or reviews cannot be substituted by bug finding tools.

2. Bug finding tools could be usefully run before conducting reviews.

3. The false positive ratio from bug finding tools needs to be lowered to realise reductions in defect-detection effort.

4. Tools should be more tolerant of programming style and design.

1. Introduction


Experimental Setup

• Five Java projects– 4 industrial (telecomms company O2)

• web information systems– 1 university

• Technische Universität München– projects in use or in final testing– projects have an interface to a relational

database• Java bug finding tools and testing was

applied to all 5 projects.

1. Introduction


Experimental Setup

• A review was applied to only one project.• Reports from the bug finding tools were

classified as true and false positives by experienced developers.

• Defects were classified by:– severity– type

1. Introduction


Techniques used by tools

• Bug patterns are based on experience and known pitfalls in a programming language.

• Readability is checked based on coding guidelines and standards.

• Dataflow and controlflow analysis.• Code annotations to allow extended static

checking/model checking.– code annotation tools ignored in this study

2. Bug Finding Tools


The Java bug finding tools• FindBugs Version 0.8.1

– bug patterns & dataflow analysis• can detect unused variables

– analyses bytecode• PMD Version 1.8

– coding standards• can detect empty try/catch blocks• can detect classes with high cyclomatic complexity

• QJ Pro Version 2.1– uses over 200 rules

• can detect too long variable names• can detect imbalance between code and commentary lines

2. Bug Finding Tools

http://findbugs.sourceforge.net/

http://pmd.sourceforge.net/

http://qjpro.sourceforge.net/


3. Projects• Project A

– online shop– software in use for 6 months– 1066 Java classes, over 58 KLOC

• Project B– pay for goods– not operational at time of study– 215 Java classes, over 24 KLOC

• Project C– frontend for file converter– software in use for 3 months– over 3 KLOC and JSP code


3. Projects• Project D

– data manager– J2EE application– 572 classes, over 34 KLOC

• EstA– non-industrial, requirements editor– not extensively used– over 4 KLOC


4.1 General• Bug finding tools used on all 5 projects.• Black-box and white-box testing of all 5

projects.• One review (Project C).• Techniques used completely independently.• Warnings from the tools are called positives

and experienced developers classified them as true positives or false positives.

4. Approach


4.1 General• Validity threats include:

– one review is not representative of reviews– only 3 bug finding tools were used

• there are many more and results might be different– testing of the mature projects did not reveal

many faults• too little data to make accurate statistical inferences

– only 5 projects were analysed• more experiments are necessary

4. Approach


4.2 Defect Categorisation

1. Defects that lead to a crash.2. Defects that cause a logical failure.3. Defects with insufficient error handling.4. Defects that violate the principles of

structured programming.5. Defects that reduce code maintainability.

4. Approach

04/22/23 Dr Andy Brooks 16Table 1, Section 5 Analysis

*

over all projects


Observations and Interpretations

• Most of the true positives are Category 5– code maintainability

• Different tools find different positives.– only one defect type was found across all tools*

• FindBugs is the only tool to find positives across all defect categories 1 thru´ 5.

• FindBugs detects the most number of types, QJ Pro the least.

5.1 Bug Finding Tools


Observations and Interpretations• True positive detection is diverse.

– For the defect type in common to all, FindBugs finds only 4 true positives, PMD finds 29, and QJ Pro finds 30.

• FindBugs and PMD have lower false positive ratios than QJ Pro.– Because all warnings have to be examined, QJ Pro is

not efficient.


FindBugs PMD QJ Pro Total0.47 0.31 0.96 0.66

Table 2. Average ratios of false positives for each tool and in total.


Observations and Interpretations• Efficiency of tools varied across projects.

– For the Category 1 defect (“Database connection not closed”), FindBugs issued true positives for projects B and D but 46 false positives for project A.

– Detection rates of true positives decreases for projects A and D for the other two tools.

• Ignoring Category 5 defects.

• Recommending a single tool is difficult.– QJ Pro is the least efficient.– FindBugs and PMD should be used in combination.

• FindBugs finds many different defect types.• PMD has accurate results for Category 5 defects.



5.2 Bug Finding Tools vs. Review• An informal review was performed on

project C with three developers.– no preparation– code author was a reviewer– code inspected at the review meeting– 19 different types of defects were found

This variable is initialised but not used.

04/22/23 Dr Andy Brooks 21Section 5.2

**


Observations and Interpretations• All defect types found by the tools* were

also found by the review of project C:– “Variable initialised but not used”

• The tools found 7 defects.• The review found only one.

– “Unnecessary if clause”• The review found 8 defects.

– An if-clause with no further computation.– 7 defects required investigation of program logic.

• The tools found only one.– The if-clause with no further computation.

5.2 Bug Finding Tools vs Review


Observations and Interpretations• But 17 additional defect types were found in the

review, some of which could have been found by tools but were not:– “Database connection is not closed” was not found by

the tools.– FindBugs is generally able to detect “String

concatenated inside loop with “+”” but did not.• to avoid creating unnecessary and unreferenced String objects

• Defect types such as “Wrong result” cannot be found by static tools but can be found in a review by manually executing a test case through the code.



Observations and Interpretations• By finding more defect types, the review of

project C can be thought of as more successful than any tool.

• Perhaps it is beneficial to use a bug finding tool first because automated static analysis is cheap.– But bug finding tools produce many false

positives and the work involved in assessing a positive as false might outweigh the benefits of automatic static analysis.



5.3 Bug Finding Tools vs. Testing• Several hundred test cases were executed.• Black-box test cases were based on the

textual specifications and the experience of the testers.– equivalence partitioning– boundary value analysis

• White-box test cases involved path testing.– Path selection criteria are not specified.


5.3 Bug Finding Tools vs. Testing

• A coverage tool checked test set quality.– Coverage was high apart from project C.– “In all the other projects, class coverage was nearly

100%, method coverage was also in that area and line coverage lay between 60% and 93%.”

• No stress tests were executed.– This “might have changed the results significantly”.

• Defects were found only for project C and project EstA.– Other projects were “probably too mature”.


Observations and Interpretations

• Dynamic testing found defects in Categories 1,2, and 3, but not 4 or 5.– Category 5 defects are not detectable by dynamic

testing.• Dynamic testing of project C and project EstA

found completely different defects to those found by the bug finding tools.

• Stress testing might have revealed the database connections that were not closed.

• “Therefore, we again recommend using both techniques in a project.”

5.3 Bug Finding Tools vs. Testing


5.4 Defect Removal Efficiency

• The total number of defects is unknown but can be estimated using all the defects found so far.

• Without regard to severity of defect, efficiency is poor for tests and good for the bug finding tools.

(Only 1 defect found in common: between Review and Tools.)


5.4 Defect Removal Efficiency

• With regard to severity of defect, tests and reviews are “far more efficient in finding defects of the categories 1 and 2 than the bug finding tools”.


6. Discussion

• The results are not too surprising:– Static tools, with no model checking capabilities, are

limited and cannot verify program logic.– Reviews and tests can verify program logic.

• Perhaps surprising is that there was not a single defect detected both by the tools and testing.– Few defects, however, were found during testing

since most of the projects were mature and already in operation. This may explain the lack of overlap.


6. Discussion• “A rather disillusioning result is the high

ratio of false positives that are issued by the tools.”– The benefits of automated detection are

outweighed by the need to manually determine a positive is false.

• No cost/benefit analysis performed in this study.


6. Discussion• Some bug finding tools make use of

additional annotations that permit some checks of logic.– The number of false positives could be reduced.– Category 1 and 2 defect detection could be

increased.– But savings could be outweighed by the need to

add annotations to the source code.


8. Conclusions

• Work is not a comprehensive empirical study and provides only “first indications” of the effectiveness of bug finding tools to other techniques.– Further experimental work is needed.– Cost/benefit models need to be built.


8. Conclusions

• Bug finding tools find:– different defects than testing– a subset of the types a review finds

• Bug finding tool effectiveness varied from project to project.– Probably because of different programming

style and design in use.• Andy asks: how should we incorporate the idea of

maintainability into static analysis tools?


8. Conclusions

• If the number of false positives were much lower, it would be safe to recommend using bug finding tools, reviews and testing in a combined approach.– “It probably costs more time to resolve the

false positives than is saved by the automation using the tools.”

Looks like another false positive and another two minutes of my time wasted...

msc software testing and maintenance msc prófun og viðhald hugbúnaðar

Documents