07 mutation testing handout - coinse · 2018-03-26 · mutation testing • white-box, fault-based...

Mutation TestingCS453, Shin Yoo

Mutation Testing• White-box, fault-based testing technique

• Inverts the testing adequacy: the goal is to assess the effectiveness of the existing test suite in terms of its fault detection capabilities.

• Test suites test programs

• Mutants test test suites

• The most widely used adequacy score is called Mutation Score: it measures the quality of the given test suite as the percentage of injected faults that you can detect.

How do you choose the ideal test data?

COMPGS03/COMPM023 Mutation Testing

How it works?

Sunday, 9 February 14

How it works?

How do you demonstrate that the bendy road is the better test environment?

How it works?

Sunday, 9 February 14COMPGS03/COMPM023 Mutation Testing

How it works?

Sabotage the car!

How it works?

Some tests are kinder(?) to faults: we want tests that are mean to faults.

Mutation Testing• Testing is a sampling process: without a priori knowledge of faults, it is

hard to assess how well a technique samples.

• Mutation testing: the quality of a test suite can be indirectly measured by artificially injecting faults and checking how well the test suite can detect them.

• Seed the original implementation with faults (the seeded versions are called mutants)

• Execute the given test suite

• If we get different test results, the introduced faults (the mutant) has been identified (i.e., the mutant is killed). If not, the mutant is still alive.

Fundamental Hypothesis

• Competent Programmer Hypothesis

• Coupling Effect Hypothesis

Competent Programmer Hypothesis

Q: what do the programmers and the monkeys have in common when it comes to programming?

A: they write buggy code.

Competent Programmer Hypothesis

• On average, programmers are competent, i.e., they write almost correct programs. A faulty program source code is different from the correct one only in a few, minor detail.

Coupling Effect Hypothesis

• If a test set detects all small syntactic faults, it will also detect larger, semantic faults: especially if those semantic faults are coupled with the small faults.

• Richard A. DeMillo and Richard J. Lipton and Frederick Gerald Sayward, Hints on Test Data Selection: Help for the Practicing Programmer, Computer, 11(4), 1978.

• A. Jefferson Offutt, Investigations of the Software Testing Coupling Effect , ACM Transactions on Software Engineering and Methodology, 1(1), January 1992.

Coupling Effect Hypothesis

Space of FaultsSimple Faults Complex Faults

Fundamental Hypothesis

• Competent Programmer Hypothesis: programmers are likely to make simple faults.

• Coupling Effect Hypothesis: if we catch all the simple faults, we will be able to catch more complicated faults.

• Mutation testing: therefore, let us artificially inject simple faults!

Mutation Testing Process

ProgramBug

Mutants

Test Suite

Same Results (Alive) Different Results (Killed)

Mutant Generation

• P’ differs from P by a single mutation

• Mutation: a typical simple error programmers are likely to make - off by one, typo, mistaken identity, etc.

Program P Mutant P’

Mutation OperatorAn atomic rule that is used to generate a mutant

A Fortran Language System for Mutation-Based Software Testing, Kim N. King and Jeff Offutt. Software Practice and Experience, 21(7):686-718, July 1991

ABS: Absolute Value Insertion

x = 4 * y;x = 4 * abs(y); x = 4 * -abs(y); x = 4 * failOnZero(y);

AOR: Arithmetic Operator Replacement

x = y + z;x = y * z; x = y - z; x = y / z;

ROR: Relational Operator Replacement

if(x >= y)

if(x > y) if(x == y) if(x < y) if(x!=y) …

COR: Conditional Operator Replacement

if(x && y)if(x || y) if(x & y) if(x | y)

SDL: Statement Deletion

x = 3; y = x + 5; z = x - y;

x = 3;

z = x - y;

Mutation Operator• Any systematic and syntactic change operator can be considered.

• For C: 71 Mutation Operators (Statement 15, Operator 46,Variable 7, Constant 3)

• Design of Mutant Operators for the C Programming Languageby Hiralal Agrawal, Richard A DeMillo, R Hathaway,William Hsu,Wynne Hsu, Edward W Krauser, Rhonda J Martin, Aditya P Mathur, Eugene H Spafford, technical report, Purdue University, 1989

• For Class Mutation: 24 Mutation Operators (Access Control 1, Inheritance 7, Polymorphism 4, Overloading 4, Java-Specific Features 4, Common Programming Mistakes 4)

• Y.-S. Ma, Y.-R. Kwon, and J. Offutt. Inter-class mutation operators for java. In Proceedings of the 13th International Symposium on Software Reliability Engineering, ISSRE ’02, pages 352–, Washington, DC, USA, 2002. IEEE Computer Society.

Mutation Operator• Any systematic and syntactic change operator can be considered.

• For spreadsheets:

• RCR (Reference for Constant Replacement)

• FRC (Formula Replacement with Constants)

• CRE (Contiguous Range Expansion)

• CRS (Contiguous Range Shrinking)

• …

• R. Abraham and M. Erwig. Mutation operators for spreadsheets. IEEE Transactions on Software Engineering, 35(1):94–108, 2009.

Killing a Mutant

x = y + z; … print(x);

Program P

x = y * z; … print(x);

Mutant P’

Test: y = 2, z = 2 4 4

Test: y = 3, z = 1 4 3

Killed

Killing a Mutant

x = y + y; … print(x);

Program P

x = y * 2; … print(x);

Mutant P’

Test: y = 2 4 4

Test: y = 3 6 6

Equivalent Mutant• An equivalent mutant is syntactically different from, but

semantically identical to, the original program.

• x = y + y; vs. x = y * 2;

• Checking whether an arbitrary mutant is equivalent or not is undecidable.

• This is one of the major obstacles to the mainstream adoption of mutation testing.

• “My mutation score is 70%. Is my test suite bad, or are there too many equivalent mutants?”

Mutation Score

MS =# of killed mutants

# of all mutants<latexit sha1_base64="OTCWyj3sQUfphndOi++5HvcThGY=">AAACKXicbVA9TxtBEJ0z3ybAkZQ0KyykVNYdDVBEAdKkiQRKDJZ8lrW33rNX3o/T7hzCOlnKv0nDX6GhCB8NRf5I1jYSscmTRnp6b0Yz89JcCodR9BhUFhaXlldW16rr7zY2t8Lt9xfOFJbxBjPS2GZKHZdC8wYKlLyZW05VKvllOvgy9i+vuHXC6B84zHlb0Z4WmWAUvdQJT759J59IklnKykSl5rpMasRkZCCk5F2iCqQa3Wg0Y1IpX51OWIvq0QTkLYlfSO348/PtTwA464R3SdewQnGNTFLnWnGUY7ukFgWTfFRNCsdzyga0x1ueaqq4a5eTV0dkzytdkhnrSyOZqP9OlFQ5N1Sp71QU+27eG4v/81oFZoftUui8QK7ZdFFWSIKGjHMjXWE5Qzn0hDIr/K2E9amPDX26VR9CPP/yW9LYrx/V43MfxilMsQo7sAsfIYYDOIavcAYNYPALbuE33Ac3wV3wEDxNWyvBy8wHmEHw5y+csKn3</latexit><latexit sha1_base64="SY3WgNHHHfd74Z8YIMEQ1DO9FyA=">AAACKXicbVDJSgNBEO1xN25Rj14ag+ApzHhwOYhRL16EiEaFTAg9nZ6kSS9Dd40YhnyPFz/DqxcPbhcP/oYHOwu4Pih4vFdFVb0oEdyC7794I6Nj4xOTU9O5mdm5+YX84tK51amhrEK10OYyIpYJrlgFOAh2mRhGZCTYRdQ+7PkXV8xYrtUZdBJWk6SpeMwpASfV8/vHp3gXh7EhNAtlpK+zsIB1jNtcCNbAMgWiwHa7P0wixJdTzxf8ot8H/kuCISmU9t7ur+82P8r1/EPY0DSVTAEVxNpq4CdQy4gBTgXr5sLUsoTQNmmyqqOKSGZrWf/VLl5zSgPH2rhSgPvq94mMSGs7MnKdkkDL/vZ64n9eNYV4u5ZxlaTAFB0silOBQeNebrjBDaMgOo4Qari7FdMWcbGBSzfnQgh+v/yXVDaKO8XgxIVxgAaYQitoFa2jAG2hEjpCZVRBFN2ge/SInrxb78F79l4HrSPecGYZ/YD3/gkum6vl</latexit><latexit sha1_base64="SY3WgNHHHfd74Z8YIMEQ1DO9FyA=">AAACKXicbVDJSgNBEO1xN25Rj14ag+ApzHhwOYhRL16EiEaFTAg9nZ6kSS9Dd40YhnyPFz/DqxcPbhcP/oYHOwu4Pih4vFdFVb0oEdyC7794I6Nj4xOTU9O5mdm5+YX84tK51amhrEK10OYyIpYJrlgFOAh2mRhGZCTYRdQ+7PkXV8xYrtUZdBJWk6SpeMwpASfV8/vHp3gXh7EhNAtlpK+zsIB1jNtcCNbAMgWiwHa7P0wixJdTzxf8ot8H/kuCISmU9t7ur+82P8r1/EPY0DSVTAEVxNpq4CdQy4gBTgXr5sLUsoTQNmmyqqOKSGZrWf/VLl5zSgPH2rhSgPvq94mMSGs7MnKdkkDL/vZ64n9eNYV4u5ZxlaTAFB0silOBQeNebrjBDaMgOo4Qari7FdMWcbGBSzfnQgh+v/yXVDaKO8XgxIVxgAaYQitoFa2jAG2hEjpCZVRBFN2ge/SInrxb78F79l4HrSPecGYZ/YD3/gkum6vl</latexit><latexit sha1_base64="PpDA8Iv+CBNLpt/Ra/JtpOGIgeE=">AAACKXicbVDJSgNBEO1xjXEb9eilMQiewowX9SBEvXgRIjoqZELo6fRok16G7hoxDPM9XvwVLzm4Xf0ROzHg+qDg8V4VVfWSTHALQfDqTUxOTc/MVuaq8wuLS8v+yuqF1bmhLKJaaHOVEMsEVywCDoJdZYYRmQh2mfSOhv7lLTOWa3UO/Yy1JblWPOWUgJM6/sHJGd7HcWoILWKZ6LsirmGd4h4XgnWxzIEosGX5wyRCfDkdvxbUgxHwXxKOSQ2N0ez4g7iraS6ZAiqIta0wyKBdEAOcClZW49yyjNAeuWYtRxWRzLaL0asl3nRKF6fauFKAR+r3iYJIa/sycZ2SwI397Q3F/7xWDuluu+Aqy4Ep+rkozQUGjYe54S43jILoO0Ko4e5WTG+Iiw1culUXQvj75b8k2q7v1cPToNY4HKdRQetoA22hEO2gBjpGTRQhiu7RI3pCz96DN/BevLfP1glvPLOGfsB7/wAFHqdM</latexit>

MS =# of killed mutants

# of non-equivalent mutants<latexit sha1_base64="1+r2LEBG6RtN9mv9zzHzP2vQbPc=">AAACNHicbVDLahtBEOyV83CUh9fOMZchIpBLxG4utg8hJrn4kICCrdigXcTsqNcaNI/NzKywWAT+Jl/8HckpPhjjGF/9DR49IJGcgoGiqrunu7JCcOui6DyorTx4+Ojx6pP602fPX6yF6xvfrS4NwzbTQpvDjFoUXGHbcSfwsDBIZSbwIBt8nvgHQzSWa7XvRgWmkh4pnnNGnZe64Zeve+QDSXJDWZXITB9XSYPonAy4ENgjsnRUOTseL5hKq3f4o+RDKlC5v0XdsBE1oynIfRLPSWPn4+XPEwBodcNfSU+zUvopTFBrO3FUuLSixnEmcFxPSosFZQN6hB1PFZVo02p69Zi88UqP5Nr457eYqv92VFRaO5KZr5TU9e2yNxH/53VKl2+lFVdF6VCx2Ud5KYjTZBIh6XGDzImRJ5QZ7nclrE99gs4HXfchxMsn3yft983tZvzNh/EJZliFV/Aa3kIMm7ADu9CCNjA4hd9wBX+Cs+AiuA5uZqW1YN7zEhYQ3N4BKJ+u5g==</latexit><latexit sha1_base64="6ikK9fhbJvIW7Nic2F6MrlDszD0=">AAACNHicbVDLSgMxFM34tr6qLt0Ei+DGMuPGuhBFF7pQULRW6JSSSe9oaB5jkhHL0J9y43foShciKm79BtNW0KoHAodz7r2590QJZ8b6/qM3MDg0PDI6Np6bmJyansnPzp0alWoKZaq40mcRMcCZhLJllsNZooGIiEMlau50/MoVaMOUPLGtBGqCnEsWM0qsk+r5/YNjvIHDWBOahSJS11lYwCrGTcY5NLBILZHWtNt9plRyBS5TdkU4SPtdVM8X/KLfBf5Lgi9S2Np8vrteKe0e1vP3YUPRVLgplBNjqoGf2FpGtGWUQzsXpgYSQpvkHKqOSiLA1LLu1W285JQGjpV2z23RVX92ZEQY0xKRqxTEXpjfXkf8z6umNi7VMiaT1IKkvY/ilGOrcCdC3GAaqOUtRwjVzO2K6QVxCVoXdM6FEPw++S8prxbXi8GRC2Mb9TCGFtAiWkYBWkNbaA8dojKi6AY9oBf06t16T96b994rHfC+euZRH7yPTwL4r4o=</latexit><latexit sha1_base64="6ikK9fhbJvIW7Nic2F6MrlDszD0=">AAACNHicbVDLSgMxFM34tr6qLt0Ei+DGMuPGuhBFF7pQULRW6JSSSe9oaB5jkhHL0J9y43foShciKm79BtNW0KoHAodz7r2590QJZ8b6/qM3MDg0PDI6Np6bmJyansnPzp0alWoKZaq40mcRMcCZhLJllsNZooGIiEMlau50/MoVaMOUPLGtBGqCnEsWM0qsk+r5/YNjvIHDWBOahSJS11lYwCrGTcY5NLBILZHWtNt9plRyBS5TdkU4SPtdVM8X/KLfBf5Lgi9S2Np8vrteKe0e1vP3YUPRVLgplBNjqoGf2FpGtGWUQzsXpgYSQpvkHKqOSiLA1LLu1W285JQGjpV2z23RVX92ZEQY0xKRqxTEXpjfXkf8z6umNi7VMiaT1IKkvY/ilGOrcCdC3GAaqOUtRwjVzO2K6QVxCVoXdM6FEPw++S8prxbXi8GRC2Mb9TCGFtAiWkYBWkNbaA8dojKi6AY9oBf06t16T96b994rHfC+euZRH7yPTwL4r4o=</latexit><latexit sha1_base64="ue2p9QLALfp+RAOpWqKwcqJWmAw=">AAACNHicbVC7SgNBFJ31GeMramkzGAQbw66NWghBGwuFiMYEsiHMTu7qkHmsM7PBsOSnbPwPKy0sVGz9BicP8Hlg4HDOvXfuPVHCmbG+/+RNTE5Nz8zm5vLzC4tLy4WV1UujUk2hShVXuh4RA5xJqFpmOdQTDUREHGpR52jg17qgDVPywvYSaApyJVnMKLFOahVOTs/xAQ5jTWgWikjdZmERqxh3GOfQxiK1RFrT7/8wpZLbcJOyLuEg7VdRq1D0S/4Q+C8JxqSIxqi0Cg9hW9FUuCmUE2MagZ/YZka0ZZRDPx+mBhJCO+QKGo5KIsA0s+HVfbzplDaOlXbPbTFUv3dkRBjTE5GrFMRem9/eQPzPa6Q23mtmTCapBUlHH8Upx1bhQYS4zTRQy3uOEKqZ2xXTa+IStC7ovAsh+H3yX1LdKe2XgjO/WD4cp5FD62gDbaEA7aIyOkYVVEUU3aFH9IJevXvv2Xvz3kelE964Zw39gPfxCYuerDc=</latexit>

How to kill a mutant

• Reachability: your test execution needs to reach (i.e. cover) the mutant

• Infection: the mutate code should infect the program state (i.e. the value of the mutated expression differs from the value of the original expression)

• Propagate: the infected state results in an observable state

How to kill a mutant

• Reachability + Infection: weak kill (i.e. we stop after confirming infection, do not check the propagation to the outside world)

• Reachability + Infection + Propagation: strong kill (i.e. the kill can be observed from the outside workd)

Killing me softly weakly…if(x < y){

if(z < y){ if(x < z) result = z; else result = x;

result = y;

} else

result = 0;

Mutation:

if(z < y + 1){

Reachability Condition: x < y

Infection Condition: (z < y) != (z < y + 1)

Weak Kill Condition: (x < y) && (z < y) != (z < y + 1)

∴ (x < y) && (z == y)

Killing me softly strongly…if(x < y){

if(z < y){ if(x < z) result = z; else result = x;

result = y;

} else

result = 0;

Mutation:

if(z < y + 1){

Reachability Condition: x < y

Infection Condition: (z < y) != (z < y + 1)

Weak Kill Condition: (x < y) && (z < y) != (z < y + 1)

∴ (x < y) && (z == y)

Propagation Condition:

After infection, x < y == z

Under this condition, Original: result = y Mutant: result = z ∴ (x < y) && (z == y) && (y != z)

Scalability• Normal testing: 1 program * 100 test cases

• Mutation testing: 1 program * 10000 mutants (including compilation!) * 100 test cases…

• We tend to get a large number of mutants:

• No prior knowledge of which mutation operator is the most effective (w.r.t. improving the test suite quality): the default is to apply everything

• Programs are large!

Scalability: do fewer• Mutation Sampling: generate a large number of mutants, but use

only a subset of them (natural question: how do we select?)

• Subsuming Mutant: a mutant M1 subsumes another mutant M2 if and only if killing M1 guarantees killing of M2.

• True subsumption relationship: not computable

• Dynamic subsumption: defined w.r.t. a given test suite

• Static subsumption: results of static analysis, still an approximation

• Selective Mutation: apply only a subset of mutation operators

Scalability: do smarter

• Super-mutant: compile all mutants into a single program, then activate a specific subset at the runtime (saves the compilation time)

• Weak mutation testing: relax the kill criterion to weak kills (requires instrumentation for the embedded oracle)

• Parallel/distributed mutation testing: obvious.

Trivial Compiler Equivalence• Idea: some syntactic changes may compile into the same binary

code thanks to compiler optimisation - if the binary is the same, the corresponding syntactic change is an equivalent mutant.

• A large scale empirical study showed that TCE can detect 7% of the mutants to be equivalent; more importantly, 21% of all mutants were duplicates (i.e. not equivalent, but identical to another non-equivalent mutant).

• M. Papadakis, Y. Jia, M. Harman, and Y. Le Traon. Trivial compiler equivalence: A large scale empirical study of a simple, fast and effective equivalent mutant detection technique. In Proceedings of the 37th International Conference on Software Engineering-Volume 1, pages 936–946. IEEE Press, 2015.

Higher Order Mutants• FOM (First Order Mutant): mutants that are generated by

a single application of one mutation operator

• HOM (High Order Mutant): mutants that are generated by two or more applications of a set of mutation operators

• Some studies claim that, while most of the FOMs are trivial to kill, few of them are coupled with real faults.

• We can search for a combination of FOMs that result in a hard-to-kill HOM.

For Researchers

• Code mutation has an alternative use for academic researchers: it can create a set of artificial faults, with which new testing techniques can be evaluated

• The Big Question: are mutants really similar to real faults?

• Touches on the same fundamental basis of mutation testing itself

• Still an open question!

Tools• Fortran: Mothra - had a long-lasting impact with its

definition of mutation operators

• C/C++: Proteum, MiLU (also searches for HOMs), MUSIC (developed at KAIST)

• Java: muJava (a special tie to KAIST), Major, Javalanche (bytecode mutation), PIT

• JavaScript: Stryker

• Ruby: Heckle,

References

• Y. Jia and M. Harman. An analysis and survey of the development of mutation testing. IEEE transactions on software engineering, 37(5):649–678, 2011.

• Mutation Testing Repository (http://crestweb.cs.ucl.ac.uk/resources/mutation_testing_repository/): an online repository that accompanies the above survey

07 mutation testing handout - coinse · 2018-03-26 · mutation testing • white-box, fault-based...

Documents

topics in mutation testing and program...

software logic mutation testing presented by

mutation testing - school of informatics

program-based mutation testing - computer science

mutation testing in java

mutation testing for model based requirements€¦ ·...

oop 2015 – mutation testing

mutation testing - ariadcmlrisks.com/resources/pdf/mutation...

testing delegation policy via mutation analysis

mutation testing implements grammar-based testing ·...

mutation testing - voxxed days bucharest 10.03.2017

a systematic literature review of how mutation testing...

mutation testing: development and challenges - · pdf...

mutation testing

semantic mutation testing

an oversight on mutation testing - ijert journal ·...

a mutation testing tool for java programs ett verktyg for...

mutation testing - university of texas at...

[andevcon 2016] mutation testing for android

mutation testing - ruby edition