using mutation analysis for assessing and comparing test

24
Using Mutation Analysis for Assessing and Comparing Test Coverage Criteria KAIST SE LAB 2013 IEEE Transactions on Software Engineering (TSE 2006) James H. Andrews, Lionel c. Briand, Yvan Labiche, and Akbar Siami Namin 2013-06-11 Yoo Jin Lim

Upload: others

Post on 09-Jun-2022

32 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Mutation Analysis for Assessing and Comparing Test

Using Mutation Analysis for Assessing and Comparing Test

Coverage Criteria

ⓒ KAIST SE LAB 2013

IEEE Transactions on Software Engineering(TSE 2006)

James H. Andrews, Lionel c. Briand, Yvan Labiche, and Akbar Siami Namin

2013-06-11Yoo Jin Lim

Page 2: Using Mutation Analysis for Assessing and Comparing Test

Contents

IntroductionRelated WorkExperimental DescriptionAnalysis ResultsConclusionDiscussion

2/24 ⓒ KAIST SE LAB 2013

Page 3: Using Mutation Analysis for Assessing and Comparing Test

Introduction (1/4)

Software testing research Find effective test techniques for software

3/24ⓒ KAIST SE LAB 2013

..public void bar() {

//print reportif (report == null) println(report);}

Faults

Real Produced

AutomaticallyManually

Mutation Analysis

Page 4: Using Mutation Analysis for Assessing and Comparing Test

Introduction (2/4)

Mutant Generation

(+) Well-defined, scalable (–) Less realistic than hand-seeded faults

Mutation Analysis The process of analyzing when mutants fail

4/24ⓒ KAIST SE LAB 2013

for ( int i = 1; i <= n; i++)fact = fact * i;

for ( int i = 1; i < n; i++)fact = fact * i;

Mutation Operator MutantSource Code

What is the external validity of mutants in assessing the fault detection effectiveness of testing techniques?

Page 5: Using Mutation Analysis for Assessing and Comparing Test

Introduction (3/4)

Test coverage criteria Data flow examines how variables are used.

• C-Use (Computation) coverage and P-Use (Predicate) coverage

Control flow examines logical expressions.• Block coverage and decision coverage

Subsumption Relationship*

5/24ⓒ KAIST SE LAB 2013

All-Use

C-Use P-Use

Decision

Block*Lori A. Clarke, Andy Prodgurski, Debra J. Richardson, Steven J. Zeli, “A Comparison of Data Flow Path selection Criteria,” 8th ICSE, 1985.

Page 6: Using Mutation Analysis for Assessing and Comparing Test

Introduction (4/4)

Basis: The previous work (ICSE 2005)

6/24ⓒ KAIST SE LAB 2013

• Generated mutants are similar to the real faults but different from hand-seeded faults, which are harder to detect than the real faults.

Randomly selected

Goal: The current work (TSE 2006)• Extend “Coverage criteria selected test suites”• Investigate: their cost-effectiveness using

• fault detection, • test suite size• control/data flow coverage

CoverageCriteria

Selected

Page 7: Using Mutation Analysis for Assessing and Comparing Test

Related work

Frankl & Weiss (1991)

Hutchins(1994)

Frankl &Iakounenko

(1998)

This work(TSE 2006)

CoverageCriteria

All-Use and Decision

Def-Use and Decision

All-Use and Decision

Block, Decision, P-Use, and C-Use

Subject Program 33 – 60 LOCs 141 – 512 LOCs 6218 LOCs 6218 LOCs

Faults 7 real faults 130 hand-seeded faults 33 real faults 34 real faults

736 mutants

VariablesFault detection, test suite size,and coverage

Fault detection, test suite size, and coverage

Fault detection, and coverage

Fault detection, test suite size, and coverage

7/24 ⓒ KAIST SE LAB 2012

Page 8: Using Mutation Analysis for Assessing and Comparing Test

Experimental Description (1/4)

8/24ⓒ KAIST SE LAB 2013

FaultDetection

Effectiveness

TestSuite Size

(Cost)

Control/Data Flow

Coverage

MutationDetection

Fault Detection1

2

3

6

47

Random Test suite 5

Q1: Are mutation scores good predictors of actual fault detection ratios?

Q2: What is the cost of achieving given levels of the coverage criteria?

Q3: What levels of coverage should be achieved to obtain reasonable levels of fault detection effectiveness?

Q4: What is the cost effectiveness of the control and data flow coverage criteria?

Page 9: Using Mutation Analysis for Assessing and Comparing Test

Experimental Description (2/4)

9/24ⓒ KAIST SE LAB 2013

FaultDetection

Effectiveness

TestSuite Size

(Cost)

Control/Data Flow

Coverage

MutationDetection

Fault Detection1

2

3

6

4

Fault Detection Difficulty

7

Random Test suite 5

Q7: How is the cost-benefit analysis of coverage criteria affected by variations in fault detection difficulty?Q5: What is the gain of

using coverage criteria compared to random testing?

Q6: Do we still find a statistically significant relationship b/t coverage level and fault detection effectiveness when we account for test suite size?

Page 10: Using Mutation Analysis for Assessing and Comparing Test

Experimental Description (3/4)

Subject Programs A middle size industrial program: space.c

• Total of 34 faulty versions

Mutant Generation Four classes of “mutation operators” were applied.

• Replace an integer constant by another integer.• Replace an operator with another operator from the same class.• Negate the decision in an if or while statement.• Delete a statement.

10/24ⓒ KAIST SE LAB 2012

Page 11: Using Mutation Analysis for Assessing and Comparing Test

Experimental Description (4/4)Test programs 11,379 mutant programs Run on every 10th mutant generated 1,137 736 of them faulty behavior

Overall experiments approach Generate test suites for various coverage criteria and levels. Find test suites from 50% coverage to high coverage.

• Randomly select five test suites to each of these intervals.– 50.00% - 50.99% Coverage– …– 95.00% - 95.99% Coverage

From the same pool, generate • Random Suite = 1700 test suites

11/24ⓒ KAIST SE LAB 2012

Coverage Suite = 230 test suites

Page 12: Using Mutation Analysis for Assessing and Comparing Test

Analysis Results (1/10)

Q1: Are mutation scores good predictors of actual fault detection ratios? Calculate the difference between the scores and ratios

• Am = mutation detection ratio (mutation scores)• Af = fault detection ratio

Calculate statistical significance of the difference• P-value < 0.001

12/24ⓒ KAIST SE LAB 2013

Am will not significantly and systematically under/overestimate Af.

Descriptive Statistics for Af-Am

Page 13: Using Mutation Analysis for Assessing and Comparing Test

Analysis Results (2/10)

Q1 Is Am a good predictor of Af? MRE: Can we accurately predict Af based on Am?

• MRE = Magnitude of Relative Error = |Af – Am| / Af

13/24ⓒ KAIST SE LAB 2013

Average MRE range from 0.083 to 0.098 relative error under 10%

MRE decreases as %Coverage increases

Achieve high %Coverage and predict Af more accurately.

Page 14: Using Mutation Analysis for Assessing and Comparing Test

Am is an unbiased and accurate predictor of Af.

Q1 Is Am a good predictor of Af? Perform linear regression between Af and Am.

Analysis Results (3/10)

14/24ⓒ KAIST SE LAB 2013

Use Am as a measure of fault detection effectiveness

R 0.908

Page 15: Using Mutation Analysis for Assessing and Comparing Test

Analysis Results (4/10)

Q2: What is the cost of achieving the %Coverage? Use exponential regression. Assume the effort associated with a coverage level is

proportional to test suite size.

15/24ⓒ KAIST SE LAB 2013

Achieving higher levels of coverage becomes increasingly expensive.

R 0.943 0.977

Cost of achieving coverage level: Block < C-Use < Decision < P-Use

Page 16: Using Mutation Analysis for Assessing and Comparing Test

Analysis Results (5/10)Q3: What levels of coverage should be achieved

to obtain reasonable levels of Am? Use exponential regression

16/24ⓒ KAIST SE LAB 2013

Curves range from nearly linear to mildly exponential.

Am of a given level of coverage increases from Block, C-Use, Decision, to P-Use.

Am of achieving coverage level: Block < C-Use < Decision < P-Use

Page 17: Using Mutation Analysis for Assessing and Comparing Test

Analysis Results (6/10)

Q4: What is the cost effectiveness of the control and data flow coverage criteria? Use exponential regression between Am and size

• Am = a * sizeb

17/24ⓒ KAIST SE LAB 2013

None of the four criteria is more cost-effective than the others

R 0.976

Regression Coefficients, R2, and Surface Areas for All Am-Size Models

Page 18: Using Mutation Analysis for Assessing and Comparing Test

Q5: What is the gain of using coverage criteria compared to random test suites (=null-criterion)? Use exponential regression between Am and size

Analysis Results (7/10)

The gain of using coverage criteria increases as test suite size increases.

18/24ⓒ KAIST SE LAB 2013

Coverage criteria are more cost-effective than null-criterion.

Page 19: Using Mutation Analysis for Assessing and Comparing Test

R 0.99

Q6: Is the effect of coverage simply a size effect? If so, random test suites would perform as well as

coverage criteria test suites. Use bivariate regression considering both test suite size

and coverage level to explain variations in Am.

Analysis Results (8/10)

19/24ⓒ KAIST SE LAB 2013

Coverage criteria help develop more effective test suites beyond size effect.

PredAm is Am predictor obtained from using size and coverage as covariates.

Both size and coverage play a complementary role in explaining fault detection despite their strong correlation.

Page 20: Using Mutation Analysis for Assessing and Comparing Test

Q7: How is the cost-benefit analysis of coverage criteria affected by fault detection difficulty? Use “hard” mutants detected by < 5% and “very hard”

mutants detected by < 1.5% of the test cases.

Analysis Results (9/10)

Both subsets show a nearly linear relationship.

Similar relationships were reported by Hutchins et al. They had seeded much more difficult faults.

20/24ⓒ KAIST SE LAB 2013

Page 21: Using Mutation Analysis for Assessing and Comparing Test

Q7: How is the cost-benefit analysis of coverage criteria affected by fault detection difficulty?

Analysis Results (10/10)

Am sharply increases in the highest 10-20% coverage levels.

Similar relationships were reported by Hutchins et al. They had seeded much more difficult faults.

21/24ⓒ KAIST SE LAB 2013

The relationships between Am and test suite size or coverage are very sensitive to the fault detection difficulty.

Page 22: Using Mutation Analysis for Assessing and Comparing Test

Conclusion

Contributions Investigated the feasibility of using mutation analysis to

assess the cost-effectiveness of control/data flow criteria• mutation analysis can be used in a research context to

compare and assess new testing techniques

Applied mutation analysis to fundamental questions regarding the relationships between fault detection, test suite size, and control/data flow coverage

Suggested a way to tune the mutation analysis process to a specific environment’s fault detection pattern

22/24ⓒ KAIST SE LAB 2013

Page 23: Using Mutation Analysis for Assessing and Comparing Test

Discussion

Pros Precisely defined and justified analysis procedure

based on a large number of mutant programs replicable experiment

Cons Only one small program with real faults Testing cost modeled by the size of test suites

23/24 ⓒ KAIST SE LAB 2012

Page 24: Using Mutation Analysis for Assessing and Comparing Test

ⓒ KAIST SE LAB 2013