msc software testing and maintenance msc prófun og viðhald hugbúnaðar
DESCRIPTION
MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar. Fyrirlestrar 41 & 42 Comparing Bug Finding Tools. Can you detect me?. Case Study Dæmisaga. Reference Comparing Bug Finding Tools with Reviews and Tests , - PowerPoint PPT PresentationTRANSCRIPT
04/22/23 Dr Andy Brooks 1
MSc Software Testing and MaintenanceMSc Prófun og viðhald hugbúnaðar
Fyrirlestrar 41 & 42Comparing Bug Finding Tools
Can you detect me?
04/22/23 Dr Andy Brooks 2
Case StudyDæmisaga
ReferenceComparing Bug Finding Tools with Reviews and Tests,Stefan Wagner, Jan Jürjens, Claudia Koller, and Peter Trischberger, Institut für Informatik, Technische Universität München, 2005http://www4.in.tum.de/publ/papers/SWJJCKPT05.pdf
04/22/23 Dr Andy Brooks 3
1. Introduction
• Software quality assurance accounts for around 50% of the development time.
• Defect-detection techniques need to be improved and costs reduced.
• There are a number of automated static analysis tools called bug finding tools.
• Faults are the cause of failures in code.
04/22/23 Dr Andy Brooks 4
Problem
1. Which kinds of defects are found by bug finding tools, reviews, and testing?
2. Are the same or different defects found?• How much overlap is there between the
different techniques?
3. Do the static analysis tools produce too many false positives?
• Bug reports that are not actually bugs...
1. Introduction
04/22/23 Dr Andy Brooks 5
Results1. Bug finding tools detect only a subset of the
kinds of defects that reviews find.2. The tools are better regarding the bug patterns
they are programmed for.3. Testing finds completely different defects than
bug finding tools.4. Bug finding tools produce many more false
positives than true positives.5. Results of applying bug finding tools varies
according to the project being studied.
1. Introduction
04/22/23 Dr Andy Brooks 6
Consequences
1. Testing or reviews cannot be substituted by bug finding tools.
2. Bug finding tools could be usefully run before conducting reviews.
3. The false positive ratio from bug finding tools needs to be lowered to realise reductions in defect-detection effort.
4. Tools should be more tolerant of programming style and design.
1. Introduction
04/22/23 Dr Andy Brooks 7
Experimental Setup
• Five Java projects– 4 industrial (telecomms company O2)
• web information systems– 1 university
• Technische Universität München– projects in use or in final testing– projects have an interface to a relational
database• Java bug finding tools and testing was
applied to all 5 projects.
1. Introduction
04/22/23 Dr Andy Brooks 8
Experimental Setup
• A review was applied to only one project.• Reports from the bug finding tools were
classified as true and false positives by experienced developers.
• Defects were classified by:– severity– type
1. Introduction
04/22/23 Dr Andy Brooks 9
Techniques used by tools
• Bug patterns are based on experience and known pitfalls in a programming language.
• Readability is checked based on coding guidelines and standards.
• Dataflow and controlflow analysis.• Code annotations to allow extended static
checking/model checking.– code annotation tools ignored in this study
2. Bug Finding Tools
04/22/23 Dr Andy Brooks 10
The Java bug finding tools• FindBugs Version 0.8.1
– bug patterns & dataflow analysis• can detect unused variables
– analyses bytecode• PMD Version 1.8
– coding standards• can detect empty try/catch blocks• can detect classes with high cyclomatic complexity
• QJ Pro Version 2.1– uses over 200 rules
• can detect too long variable names• can detect imbalance between code and commentary lines
2. Bug Finding Tools
04/22/23 Dr Andy Brooks 11
3. Projects• Project A
– online shop– software in use for 6 months– 1066 Java classes, over 58 KLOC
• Project B– pay for goods– not operational at time of study– 215 Java classes, over 24 KLOC
• Project C– frontend for file converter– software in use for 3 months– over 3 KLOC and JSP code
04/22/23 Dr Andy Brooks 12
3. Projects• Project D
– data manager– J2EE application– 572 classes, over 34 KLOC
• EstA– non-industrial, requirements editor– not extensively used– over 4 KLOC
04/22/23 Dr Andy Brooks 13
4.1 General• Bug finding tools used on all 5 projects.• Black-box and white-box testing of all 5
projects.• One review (Project C).• Techniques used completely independently.• Warnings from the tools are called positives
and experienced developers classified them as true positives or false positives.
4. Approach
04/22/23 Dr Andy Brooks 14
4.1 General• Validity threats include:
– one review is not representative of reviews– only 3 bug finding tools were used
• there are many more and results might be different– testing of the mature projects did not reveal
many faults• too little data to make accurate statistical inferences
– only 5 projects were analysed• more experiments are necessary
4. Approach
04/22/23 Dr Andy Brooks 15
4.2 Defect Categorisation
1. Defects that lead to a crash.2. Defects that cause a logical failure.3. Defects with insufficient error handling.4. Defects that violate the principles of
structured programming.5. Defects that reduce code maintainability.
4. Approach
04/22/23 Dr Andy Brooks 16Table 1, Section 5 Analysis
*
over all projects
04/22/23 Dr Andy Brooks 17
Observations and Interpretations
• Most of the true positives are Category 5– code maintainability
• Different tools find different positives.– only one defect type was found across all tools*
• FindBugs is the only tool to find positives across all defect categories 1 thru´ 5.
• FindBugs detects the most number of types, QJ Pro the least.
5.1 Bug Finding Tools
04/22/23 Dr Andy Brooks 18
Observations and Interpretations• True positive detection is diverse.
– For the defect type in common to all, FindBugs finds only 4 true positives, PMD finds 29, and QJ Pro finds 30.
• FindBugs and PMD have lower false positive ratios than QJ Pro.– Because all warnings have to be examined, QJ Pro is
not efficient.
5.1 Bug Finding Tools
FindBugs PMD QJ Pro Total0.47 0.31 0.96 0.66
Table 2. Average ratios of false positives for each tool and in total.
04/22/23 Dr Andy Brooks 19
Observations and Interpretations• Efficiency of tools varied across projects.
– For the Category 1 defect (“Database connection not closed”), FindBugs issued true positives for projects B and D but 46 false positives for project A.
– Detection rates of true positives decreases for projects A and D for the other two tools.
• Ignoring Category 5 defects.
• Recommending a single tool is difficult.– QJ Pro is the least efficient.– FindBugs and PMD should be used in combination.
• FindBugs finds many different defect types.• PMD has accurate results for Category 5 defects.
5.1 Bug Finding Tools
04/22/23 Dr Andy Brooks 20
5.2 Bug Finding Tools vs. Review• An informal review was performed on
project C with three developers.– no preparation– code author was a reviewer– code inspected at the review meeting– 19 different types of defects were found
This variable is initialised but not used.
04/22/23 Dr Andy Brooks 21Section 5.2
**
04/22/23 Dr Andy Brooks 22
Observations and Interpretations• All defect types found by the tools* were
also found by the review of project C:– “Variable initialised but not used”
• The tools found 7 defects.• The review found only one.
– “Unnecessary if clause”• The review found 8 defects.
– An if-clause with no further computation.– 7 defects required investigation of program logic.
• The tools found only one.– The if-clause with no further computation.
5.2 Bug Finding Tools vs Review
04/22/23 Dr Andy Brooks 23
Observations and Interpretations• But 17 additional defect types were found in the
review, some of which could have been found by tools but were not:– “Database connection is not closed” was not found by
the tools.– FindBugs is generally able to detect “String
concatenated inside loop with “+”” but did not.• to avoid creating unnecessary and unreferenced String objects
• Defect types such as “Wrong result” cannot be found by static tools but can be found in a review by manually executing a test case through the code.
5.2 Bug Finding Tools vs Review
04/22/23 Dr Andy Brooks 24
Observations and Interpretations• By finding more defect types, the review of
project C can be thought of as more successful than any tool.
• Perhaps it is beneficial to use a bug finding tool first because automated static analysis is cheap.– But bug finding tools produce many false
positives and the work involved in assessing a positive as false might outweigh the benefits of automatic static analysis.
5.2 Bug Finding Tools vs Review
04/22/23 Dr Andy Brooks 25
5.3 Bug Finding Tools vs. Testing• Several hundred test cases were executed.• Black-box test cases were based on the
textual specifications and the experience of the testers.– equivalence partitioning– boundary value analysis
• White-box test cases involved path testing.– Path selection criteria are not specified.
04/22/23 Dr Andy Brooks 26
5.3 Bug Finding Tools vs. Testing
• A coverage tool checked test set quality.– Coverage was high apart from project C.– “In all the other projects, class coverage was nearly
100%, method coverage was also in that area and line coverage lay between 60% and 93%.”
• No stress tests were executed.– This “might have changed the results significantly”.
• Defects were found only for project C and project EstA.– Other projects were “probably too mature”.
04/22/23 Dr Andy Brooks 27
04/22/23 Dr Andy Brooks 28
Observations and Interpretations
• Dynamic testing found defects in Categories 1,2, and 3, but not 4 or 5.– Category 5 defects are not detectable by dynamic
testing.• Dynamic testing of project C and project EstA
found completely different defects to those found by the bug finding tools.
• Stress testing might have revealed the database connections that were not closed.
• “Therefore, we again recommend using both techniques in a project.”
5.3 Bug Finding Tools vs. Testing
04/22/23 Dr Andy Brooks 29
5.4 Defect Removal Efficiency
• The total number of defects is unknown but can be estimated using all the defects found so far.
• Without regard to severity of defect, efficiency is poor for tests and good for the bug finding tools.
(Only 1 defect found in common: between Review and Tools.)
04/22/23 Dr Andy Brooks 30
5.4 Defect Removal Efficiency
• With regard to severity of defect, tests and reviews are “far more efficient in finding defects of the categories 1 and 2 than the bug finding tools”.
04/22/23 Dr Andy Brooks 31
6. Discussion
• The results are not too surprising:– Static tools, with no model checking capabilities, are
limited and cannot verify program logic.– Reviews and tests can verify program logic.
• Perhaps surprising is that there was not a single defect detected both by the tools and testing.– Few defects, however, were found during testing
since most of the projects were mature and already in operation. This may explain the lack of overlap.
04/22/23 Dr Andy Brooks 32
6. Discussion• “A rather disillusioning result is the high
ratio of false positives that are issued by the tools.”– The benefits of automated detection are
outweighed by the need to manually determine a positive is false.
• No cost/benefit analysis performed in this study.
04/22/23 Dr Andy Brooks 33
6. Discussion• Some bug finding tools make use of
additional annotations that permit some checks of logic.– The number of false positives could be reduced.– Category 1 and 2 defect detection could be
increased.– But savings could be outweighed by the need to
add annotations to the source code.
04/22/23 Dr Andy Brooks 34
8. Conclusions
• Work is not a comprehensive empirical study and provides only “first indications” of the effectiveness of bug finding tools to other techniques.– Further experimental work is needed.– Cost/benefit models need to be built.
04/22/23 Dr Andy Brooks 35
8. Conclusions
• Bug finding tools find:– different defects than testing– a subset of the types a review finds
• Bug finding tool effectiveness varied from project to project.– Probably because of different programming
style and design in use.• Andy asks: how should we incorporate the idea of
maintainability into static analysis tools?
04/22/23 Dr Andy Brooks 36
8. Conclusions
• If the number of false positives were much lower, it would be safe to recommend using bug finding tools, reviews and testing in a combined approach.– “It probably costs more time to resolve the
false positives than is saved by the automation using the tools.”
Looks like another false positive and another two minutes of my time wasted...