a conceptual tool for specification based testing€¦ · a conceptual tool for specification based...

EEaassyy TTeesstt!! A Conceptual tool for specification based testing

This report is submitted in partial fulfilment of the requirements for the degree of MSc (Eng) in Advanced Software Engineering

Omair Ali 2007

Supervisor: Dr Mike Holcombe

Declaration

ii Department of Computer Science,

University of Sheffield, 2007

Declaration All sentences or passages quoted in this dissertation from other people’s work have been specifically acknowledged by clear cross-referencing to author, work and page(s). I understand that failure to do these amounts to plagiarism and will be considered grounds for failure in this dissertation and the degree examination as a whole. Name: Omair Ali Signature: Date: 11th September, 2007

Abstract

iii Department of Computer Science,


Abstract Norman Fenton states in his book Software Metric (1997 edition) that Test first strategy is 50% more productive based on LOC measurements. Renowned author Nell Dale remarks, “Testing must begin in CS1 (equivalent to first year) courses and reinforced in each succeeding year” in view of rampant non-conformance to the ACM/IEEE Computing Curriculum 2001. Another renowned author Michael Goldwasser observed at SIGCSE 2002, “Testing as a unit is given very little coverage in introductory courses, often with only an expectation of ad-hoc debugging of one’s own implementation”. In 2002 Goldwasser further states that XP has the largest market share among all agile methodologies. Glenford Myers states in his book the Art of Software Testing (2004 edition) the rule of the thumb is that most projects expend 50 percent of development time and about the same proportion of cost on testing. The above paragraph may seem to be a haphazard sequence of incidents. However, astute readers would be able to infer the stark contradiction portrayed between what happens in academia and the trends in software industry. Software testing has received as less importance in the curriculum as much as its importance has scaled the heights in the software industry. ‘Test First’ is the buzzword of the industry while at the teaching level it is not even methodically treated as an important part of the course at the beginner’s level of the undergraduate course. The main aim of this project is to inculcate the notion of testing among students of the first Java course of the department of Computer Science at the University of Sheffield. It seeks to develop and awareness about the importance and benefits of testing among beginners so that they can think more logically of the possible faults their elementary programs may contain. It attempts to convert testing from an art to a craft right at the inception of programming fundamentals. The produced system demonstrates the process of writing elementary tests for simple programs from their specifications. It also generates sample test data for the students to observe the utility and purpose of such test data. This puts the testing problem into perspective.

Acknowledgements

iv Department of Computer Science,


Acknowledgements I would like to deeply thank my supervisor Prof. Mike Holcombe for the useful and helpful guidance he provided during the course of the project. I would also like to thank Dr. Georg Struth for providing valuable feedback which helped me improve my thesis to a great extent. I would also like to thank Dr. Anthony Cowling and Dr Neil Walkinshaw for his help during the initial stages of the project. I would also want to thank my friends Ahmet Aker, Shihai Wang, and M. K. Faizuddin for their tremendous support during the course of my project. A special thanks to my family for their support, guidance and encouragement to keep my moral high in troubled times.

http://www.dcs.shef.ac.uk/cgi-bin/makeperson?N.Walkinshaw

Table of Contents

v Department of Computer Science,


Table of Contents 1 Chapter 1: Introduction ........................................Error! Bookmark not defined.

1.1 Why Testing? ...............................................Error! Bookmark not defined. 1.2 Motivation....................................................Error! Bookmark not defined. 1.3 Structure of the Thesis .................................Error! Bookmark not defined.

2 Chapter 2: Literature Review...............................Error! Bookmark not defined. 2.1 Testing techniques .......................................Error! Bookmark not defined.

2.1.1 Black-Box Testing ...............................Error! Bookmark not defined. 2.1.2 White-Box Testing...............................Error! Bookmark not defined. 2.1.3 Model Based Testing ...........................Error! Bookmark not defined.

2.2 Test Based Programming Approaches.........Error! Bookmark not defined. 2.2.1 Extreme Programming and Test Driven Development................ Error! Bookmark not defined. 2.2.2 SpecSharp (Spec#) Programming System ..........Error! Bookmark not defined.

2.3 Automatic Test Data Generation .................Error! Bookmark not defined. 2.3.1 Symbolic Execution.............................Error! Bookmark not defined. 2.3.2 Model Checking...................................Error! Bookmark not defined. 2.3.3 Related Work .......................................Error! Bookmark not defined. 2.3.4 A Framework for Generating Object-Oriented Unit Tests [Symstra 2005]: Error! Bookmark not defined. 2.3.5 Search-based Software Test Data Generation.....Error! Bookmark not defined. 2.3.6 Related Work .......................................Error! Bookmark not defined. 2.3.7 Evolutionary Testing Environment for Automatic Structural Testing [Wegener 2001]: ..................................................Error! Bookmark not defined. 2.3.8 Automatic Generation of Software Test Data Using Genetic Algorithms [Sthamer 1995]: ................................Error! Bookmark not defined.

2.4 Conclusion ...................................................Error! Bookmark not defined. 3 Chapter 3: Requirements & Analysis ..................Error! Bookmark not defined.

3.1 Primary Requirements .................................Error! Bookmark not defined. 3.2 Secondary Requirements .............................Error! Bookmark not defined. 3.3 Requirement Analysis..................................Error! Bookmark not defined. 3.4 Testing & Evaluation Strategies ..................Error! Bookmark not defined.

4 Chapter 4: Design ................................................Error! Bookmark not defined. 4.1 System Goals ...............................................Error! Bookmark not defined. 4.2 System Design .............................................Error! Bookmark not defined. 4.3 System Architecture.....................................Error! Bookmark not defined.

4.3.1 View.....................................................Error! Bookmark not defined. 4.3.2 Controller .............................................Error! Bookmark not defined. 4.3.3 Model ...................................................Error! Bookmark not defined.

4.4 System Layout .............................................Error! Bookmark not defined. 5 Chapter 5: Implementation & Testing .................Error! Bookmark not defined.

5.1 Implementation ............................................Error! Bookmark not defined. 5.1.1 Methodology........................................Error! Bookmark not defined. 5.1.2 Refinements .........................................Error! Bookmark not defined.

Table of Contents

vi Department of Computer Science,


5.2 Testing..........................................................Error! Bookmark not defined. 5.2.1 Testing Driven by examples ................Error! Bookmark not defined. 5.2.2 Usability Testing..................................Error! Bookmark not defined.

6 Chapter 6: Results & Discussion .........................Error! Bookmark not defined. 6.1 Results..........................................................Error! Bookmark not defined. 6.2 Achievements & Discussions ......................Error! Bookmark not defined. 6.3 Further Work................................................Error! Bookmark not defined.

7 Chapter 7: Conclusion..........................................Error! Bookmark not defined. 8 References 9 Appendix A

Table of Contents

vii Department of Computer Science,


List of Figures

viii Department of Computer Science,


List of Figures Fig. 2.1: Control Flow graph of a small program code. Fig. 2.2: Code snippet and its Control Flow graph. (An example from [Myers 2004]). Fig. 2.3: Machine code for the program in figure 2.2 [Myers 2004] Fig. 2.4: Code that swaps two integers and the corresponding symbolic execution tree,

where transitions are labeled with program control points Fig. 2.5: Schema of the model checking method. [Barnat 2002] Fig. 2.6: a) Example program code,

b) Typical method invocations sequence and c) Symstra’s method symbolic invocations

Fig. 2.7: High level description of a hill climbing algorithm, for a problem with solution space ‘S’; neighbourhood structure ‘N’; and ‘obj’, the objective function to be maximized Fig. 2.8: High level description of a Simulated Annealing algorithm, for a problem with solution space ‘S’; neighbourhood structure ‘N’; ‘num_solns’ the number of solutions to consider at each temperature level ‘t’; and ‘obj’, the objective function to be minimized Fig. 2.9: Evolutionary Algorithm Figure 2.10: a) Block diagram of GA

b) Pseudo code of GA. Figure 2.11: Components of the evolutionary test environment. Fig. 4.1: Overview of System Goals. Fig. 4.2: Use Case overview of System Functionality. Fig. 4.3: The ‘Partitioning’ use-case and the ‘JUnit’ use-case specialize ‘Computation

& Estimation’ use-case. Fig. 4.4: The ‘Test Generation’ use-case includes the ‘Partitioning’ use-case. Fig. 4.5: Using Observer to decouple the model from the view in the Active model. Fig. 4.6: Behavior of the active model Fig. 4.7: The model classes Fig. 4.8: The dependency resolution group Fig. 4.9: The partitioning group Fig. 4.10: The test frames generator group Fig. 4.11: The dynamic JUnit test classes Fig. 4.12: a) The Lexical Analyzer structure b) The Parser classes’ layout Fig. 4.13: Class diagram showing the structure of the test frame generator classes Fig. 4.14: Sequence diagram of the system Fig. 5.1: Test cases (data) for Triangle problem. Fig. 5.2: Test cases (data) for Find Character problem. Fig. 5.3: Test cases (data) for Circle problem Fig. 5.4: Test cases (data) for a hypothetical problem

Chapter 1: Introduction

1 Department of Computer Science,


Chapter 1: Introduction The following subsections explain briefly the importance of testing and the motivation of this project. It then presents an overview of the whole thesis.

Why Testing? One cannot envisage a world of automated systems without the notion of testing being an integral part of it. The motivation behind any kind of testing is the immense importance associated with the task of testing in any field of software or embedded systems development. Fortunes and lives of millions are at stake in this era where automation of every aspect of human life has taken place, if faults persist in a safety-critical system. The software enable millions of people to do their jobs either effectively and efficiently or causing them untold frustration and the cost of lost business. Software testing is a (series of) process (es), undertaken to ensure that computer code does what it was designed to do, without doing anything unintended. A common mistake in the approach people take to testing is that they undertake testing of a program with the aim to demonstrate that it works; rather than starting with the assumption that the program contains errors; after all, the main purpose of testing is to detect errors in a program. To think of testing as a process of demonstrating that errors are not present in a program is absurd as this is impossible for virtually all programs, even trivial programs [Myers 2004]. To summarize, software testing is a destructive process of trying to find the errors (whose presence is assumed) in a program. A successful test case is one that causes the program under test to fail. Hence, testing eventually establishes some degree of confidence that a program does what it is supposed to do and does not do what it is not supposed to do. Software testing is the most common activity for validating the quality of software. It is no doubt a labor intensive process; however its true impact on software development cost is stressed by the statistics which suggest it accounts for approximately half the total cost of software development and maintenance [Beizer 1990]. This figure can be justified if we consider the amount of person-effort that is spent upon testing. According to Deutsch (1982), 44 to 50 percent of person-effort is usually devoted to the testing of projects. Boehm (1978) suggests that testing effort could vary from 28 percent of a ‘business-type’ project to 50 percent for ‘operating systems’, with a variety of other types of projects lying in this range. Testing is so immensely important to software engineering that new approaches to software development, such as eXtreme Programming (XP) require developers to write tests based on the specifications even before writing code and to run these tests regularly on the code as and when implemented. This instills confidence in the code and timely tests after increments of code developed are integrated from time to time prevent functional issues which usually arise when major components have to be integrated finally. However, writing tests is a difficult task, especially when the importance of testing is not implanted in the hearts of developers from an early stage when they begin to frame their concepts of programming.




Motivation The above fact points to a need for restructuring the way programming courses in software engineering degrees are designed in the present scenario; as they often do not deal with testing in any kind of detail. According to Nell Dale [Dale 2000], “Testing must begin in CS1 (equivalent to first year) courses and reinforced in each succeeding year” in view of rampant non-conformance to the ACM/IEEE Computing Curriculum 2001. M. Goldwasser observed in [Goldwasser 2002], “Testing as a unit is given very little coverage in introductory courses, often with only an expectation of ad-hoc debugging of one’s own implementation”. “Testing has been an out-of-vogue subject”, quips [Myers 2004]. It further states, “Our students graduate and move into industry without any substantial knowledge of how to go about testing a program”. Redressing this very fallacy was the conception behind this project. The project was conceived with an aim to inculcate in the novice programmer the notion of testing from the grass-root level. It strives to sow the seeds of the notion of testing, and help develop the skill to identify potential faults in their program code. This project is about creating a test support system, Easy Test, which is capable of providing the facilities of both a learning tool and a practical testing tool. Its focus is on ‘orientation, inculcation and demonstration’ methodology for use in the first Java programming course at the undergraduate level in the Department of Computer Science. It is comprised of two features: the first part is a learning tool explaining the concepts of testing while the second part creates tests and demonstrates the application of these tests to the program under test. The first part consists of tutorial pages that guide the students in developing the craft of testing by thinking about simple classes and functions in order to write tests. It helps develop an approach to write tests in a structured manner. The second and more functionally intensive part of the tool is the test generator and executor. It is based on the concept of XP where the stress is on a ‘test-first’ strategy. It generates tests based on the specifications of a program’s functional units. These tests can be applied as and when the implementation of the functional units is available. The aim is to develop a powerful tool for students that will give an insight into writing tests and an opportunity to test their acquired skill by observing the tests generated by the system for their code and the results it produces executing on the code they write. The purpose is to point out to the novice programmers instances of possible faults that can result in failure of the system or throw it in an unstable (e.g. exception causing) state.

Structure of the Thesis The following is how this thesis is structured: Chapter 2: Literature Review- reviews related literature on testing. It includes an overview of the various testing techniques and methodologies common used in the field of testing. It also provides a description of a couple of test based programming approaches. Further, it brings to light the various methods of automatic test data generation.




Chapter 3: Requirements & Analysis- highlights the overall objectives, requirements and goals of the project. Requirements are analysed and a list of system functionalities and goals is drawn out. Chapter 4: Design- explains the design considerations that were made before implementing the structure of the system. The issues that make the system architecture easy to understand and maintain are considered. The other factors that improve the modularity of the system and reduce the coupling of the different modules are reviewed and introduced in the system Chapter 5: Implementation and Testing- describes the implementation of the system. It reviews the methodology used to build the system. It also includes refinements that were made to the system’s requirements in order to improve the overall functionality of the system. It further treats the strategy of testing the project’s systems. Chapter 6: Results & Discussions- present the overall result of the project and also a critical evaluation of both the results and achievements. It also describes further work that may be done to improve the system. Chapter 7: Conclusion- summarises the work done in the project. References and Appendices are included at the end.

Chapter 2: Literature Review



Chapter 2: Literature Review Software testing is indispensable in any software application. It can be carried out using various testing strategies. This chapter presents an overview of these strategies and attempts to highlight the fundamental differences between several approaches to software testing. It also presents an insight into automatic test data generation.

Testing techniques There is a number of test case design methods used in software development which present a systematic testing approach. The general aim of these methods is to uncover as many errors present as possible. A software system may be tested by a number of approaches; few of them are discussed below: Black-box testing, White-box testing and Model based testing.

Black-Box Testing This kind of an approach of testing is based solely on the specifications or requirements of a software system. It does not regard the internal structure of the program for selection of test cases. To determine when the tested program’s input-output does not agree with the specification of the program is the sole objective of this approach. Testing is based solely on the knowledge of inputs to the system and outputs from the system, i.e. test data for program under test is constructed from its specifications [Beizer 1990]. This is the very essence of its strength, in being able to generate test early in the development cycle. This helps in detecting ‘missing logic’ faults in the implementation [Hamlet 1987]. Black box testing is also called functional or specification based testing. In order to be able to find all errors using this approach we would have to test for every possible input condition, i.e. carry out an exhaustive input testing. This signifies virtually an infinite number of test cases for any non-trivial (often even for trivial) program [Myers 2004]. The problem worsens in case of a memory-based system, e.g. database applications or operating systems. This is because the test data would not only have to represent all unique valid and invalid transactions, but also they have to test sequence of transactions. This implies that it is practically impossible to carry out an exhaustive input test of a program so as to guarantee that it is error free. However, it is possible to make certain reasonable assumptions about the program that would allow maximizing the number of errors found by a finite number of test cases. This could be possible if a certain type of test data, representative of other test data of the same class or type, can be assumed to have the same outcome as others in its class when tested on the system under test. Some of the methods using this approach are discussed below:

Equivalence Class Partitioning The idea employed is that the input domain is divided into a set of equivalence classes such that if one value of the class makes a program work (or behave) in a particular way, it can be assumed with reasonable surety that the program does the same with all other values of that class. Each set of inputs for which the system is expected to behave differently from others is grouped into separate equivalence classes. Hence testing the program with one member of each equivalence class would signify an




exhaustive test of the system. This can be illustrated using an example in [Jalote 2005], where a system to determine absolute value for integers specifies different behaviours for positive and negative integers. Hence, two equivalence classes can be created- one comprises positive integers and other comprises negative integers. For robust testing, invalid inputs as well as outputs should also be divided in equivalence classes in order to test the system with representative values of input classes (valid or invalid) to determine if its output lies in the output equivalence class. Any special values of input for which the system could behave differently could also be considered as equivalence classes. Test cases can be selected by considering only those which cover at most one valid equivalence class for each input and one separate test case for every invalid equivalence class.

Boundary Value Analysis Often it has been found that some values of equivalence classes do not engender expected behaviour from programs. These are the values that usually lie near the boundary of equivalence classes. Hence, test cases that are comprised of these values as representatives of equivalence classes are the ones that usually detect errors. This is the purpose of boundary value analysis. In this method inputs from an equivalence class for a test case are chosen such that they lie at the edge of that equivalence class, i.e. equivalence class edge is tested. This is done for each equivalence class, including the equivalence classes of the output parameters. A few guidelines [Myers 2004] to select test cases are as follows:

1. When there are ranges of values in the equivalence classes, boundary elements of the range and an invalid value just outside the either ends of the range can be chosen. To elucidate this, let us suppose that the valid domain of an input value (variable or parameter of a method under test) is −1.0 – +1.0, then the test case would comprise -1.0, 1.0 (valid inputs) and -1.001 and 1.001 (invalid inputs).

2. In case of a list (i.e. specified number of values) is the input, care should be taken of the last and first elements of the list and one beneath and beyond these values. For example, if a file can hold 1-63 records, test data should be generated to test 0, 1, 63 and 64 records.

3. As in the previous method, the outputs should be also considered. Test cases should be selected in such a way that the outputs generated by the system lie at the boundaries of the equivalence class. For example if a certain tax deduction amount varies from £0.00 to £1299.99 then test data that results in deductions worth both ends of the range should be considered. Care should be taken to check (and construct) if test data can be generated that may result in the output value to lie outside the permissible range.

4. Similarly, if the output of a program code under test is a list (i.e. specified number of values), then test cases to generate results which are the first and last elements of the list should be created. Here as well, attempt should be made to generate output that is not in the list. For example, if a holiday destination chooser system displays the destinations that match the conditions and budget of the prospective tourists but never returns more that 5 destinations, test data should be written that would match zero, one, and five destinations. Test data that may cause the system to erroneously return more than five destinations should also be created, if possible.




Cause Effect Graphing A drawback of boundary value analysis and equivalence partitioning is that they cannot test combinations of input circumstances. Moreover, the number of combinations of equivalence partitions is usually astronomical for any non-trivial system. In the absence of a systematic selection procedure for the input conditions, an arbitrary set of conditions would invariably be selected leading to an ineffective test data set. Cause Effect Graphing is a technique which strives to constrain the number of test cases that may be selected in order to exercise all combinations of equivalence classes of input conditions. It provides with a systematic technique that starts with identifying the causes and effects of the tested system and specifies them in a way that they can be either true or false. “A cause is a distinct input condition and an effect is a distinct output condition” [Jalote 2005]. These conditions form the nodes in the cause-effect graph. Then the combinations of causes that produce each of the effects and the manner in which they combine are determined. This cause-effect relationship is captured in a graph, from which test case are generated. In order to generate the test cases each of the effects are considered separately, and for each effect the conditions of the set of causes that enable this effect form the test cases. A condition that holds true in a test case is set to true, while the others may be false or ‘don’t care’. This process not only yields good test cases but also improves a tester’s understanding of the functionality of the system as s/he has to consider the causes and effects in the system.

Category-Partition Method This method is related in many ways to all the above methods, especially in its approach by which it tries to generate test cases that exercise every combination of sets of values of input parameters, in order to maximize the number of faults found. Its testing is based on separate functional units of a system, therefore, starts after the identification of functional units. All the parameters (input, output, environmental conditions) of a particular function that affect the function’s behaviour are identified. Distinct characteristics of each of these parameters are enumerated, called ‘categories’. These categories are then subdivided into ‘partitions’ in a way similar to applying equivalence partitions. The constraints that apply to combinations of choices are then identified. This is because it might be that one partition is mutually exclusive to another, or requires another to have a particular value. This allows test frames to be generated by combining allowable partitions of each category. These test frames can then be converted to test data using suitable values from the partitions. Catpart tool implements this method.

State-Based Testing Some systems produce different outputs at different times even with the same set of inputs. These systems’ behaviour is state based which means that the outputs of the systems are not only determined by the inputs but also by the current state of the system, which in turn depends on the cumulative effect of the past inputs to the systems. Such systems can be modeled as state machines, for which state models can be built, provided the number of states is limited. In a state model, Transitions represent the changes of state; Events represent inputs to the system, and Actions represent the




outputs for the events. A state model diagrammatically shows the state transitions that occur and the actions that are performed when events take place. Test cases can be generated by following particular coverage criteria. For example, All Transition coverage (AT) is one such coverage criterion which requires that every transition in the state graph is exercised. This ensures that different scenarios that occur as a result of inputs and state condition are tested. Likewise, there are other criteria that ensure test cases to be generated are sufficiently exhaustive to detect possible errors. MACT (Method for Automatic Class Testing), JWalk are some tools that implement this method.

White-Box Testing This approach of testing is based solely on the implementation of a software system and hence is also called program-based or structural testing, [Roper 1997]. It does not regard the different input and output conditions of a program for selection of test cases. Testing is done solely to exercise the different programming structure and data structure used in the system, i.e. test data are derived from the program's logic. This method provides a sort of feedback about the effectiveness of the implementation, e.g. coverage of the software. In order to be able to exercise all program structures using this approach we may want to test for every possible statement. However, this approach can be easily proven to be inadequate. The analog to exhaustive input testing of black-box testing approach in white-box testing approach is considered to be exhaustive path testing. This means that in order to exhaustively test a program (by this approach) all of the possible paths of the control flow through the program have to be executed. However, this is fundamentally flawed because of two reasons. The first reason is that there could be astronomical number of unique logic paths through the test program (especially those containing loops). Secondly, in spite of an exhaustive path testing of a program it may still contain numerous errors. To justify the second reason stated above there are at least three circumstances.

• If a test program does not completely match its specifications, this approach will not detect it.

• If a program has some necessary paths missing, this approach does not detect the absence of these paths

• If a program contains some data-sensitivity errors, this approach has no means to identify it. This kind of error can be highlighted by a simple example presented in [Myers 2004]. Let us suppose that a program has to check for the convergence of two numbers ‘a’ and ‘b’, i.e. check if they differ by a number less than some predetermined value ‘c’. A code like the following contains the data-sensitive error:

If ( a – b < c )

System.out.println(“ a – b < c”);

The above code fragment is erroneous as it contains should have compared the absolute value of the two numbers. This is because if both ‘a’ and ‘b’ have negative values, there will be an erroneous result.




Discussed below are three different approaches to implementation testing: control flow-based testing, data flow-based testing and mutation testing.

Control Flow Based Testing Common implementation-based testing methods consider the coverage of the various aspects of the control flow graph (CFG) of a program. A control-flow graph represents in a graph notation all the paths (sequence of program code statements) through a program that may be executed by a single run of the program. Each node in a CFG represents a sequential block of code without any jump statement or jump targets. Directed edges represent control flow jumps between blocks of sequential code (nodes). Figure 2.1 shows a particular CFG.

Fig. 2.1: Control Flow graph of a small program code.

Statement Coverage requires testing whether each statement of the program is executed at least once or not. This means ensuring that all the nodes (a block of statements) are included by the paths executed during testing. However, the drawback in this method can be revealed, considering an ‘if’ statement without an ‘else’ clause. This testing method passes if the condition in the ‘if’ clause evaluates to true, without ensuring that the condition evaluates to false. As in a program it is the decisions that are potentially erroneous, this inability is dangerous. Branch Coverage (also called decision coverage) remedies the above mentioned drawback of Statement Coverage by requiring that each edge (flow of control from one node to other) in the control flow graph is executed at least once during testing. This ensures that each condition is evaluated to false and true values (or every possible outcome of all decisions [Huang 1975]) minimum once while testing, thereby ensuring statement coverage as well. The shortcoming in this approach is that when there are multiple conditions in a decision the true and false branch can be covered without each of the conditions being exercised. This can be remedied by ensuring that all conditions




evaluate to true and false values. However, studies have revealed that some erroneous conditions arise due to a control flow through a combination of branches (i.e. if the statements and branches are executed in a certain order) not tested by this method. This can be illustrated using a simple example as shown in figure 2.2.

Fig. 2.2: Code snippet and its Control Flow graph. (An example from [Myers 2004]).

Decision coverage can be accomplished using a couple of test cases which cover the paths ‘ace’ and ‘abd’ or ‘acd’ and ‘abe’. For example, the two test-case inputs of A = 3, B = 0, X = 3 and A = 2, B = 1, X = 1 would cover paths ‘ace’ and ‘abd’ respectively. This shows that there is likelihood of 50% for the ‘x’ value not to be changed by the operations. If the second condition statement was coded incorrectly, i.e. if it should have been X<1 rather than X>1 the mistake would not have been detected by those test cases. Condition coverage is often a stronger coverage criterion than decision coverage that ensures that each atomic condition in a (compound) decision is evaluated to all possible outcomes at least once. This includes considerations for each point of entry to the program (or sub-routine). For instance, in the above example (figure 2. ) there are four atomic conditions A>1, B=0, A=2 and X>1 which must be evaluated to all possible outcomes in order for achieving condition coverage. Hence, this criterion forces the situations where A> 1, A<= 1, B=0, and B < > 0 are present at point ‘a’ and where A=2, A<>2, X>1, and X<=1 are present at point ‘b’. A disadvantage of condition coverage is that it does not always lead to decision-coverage to be achieved. This can be explained by considering a decision “ if ( A & B ) ” for which a condition-coverage can be achieved by considering test cases – “A is true, B is false” and “A is false, B is true”. However, the flaw in these test cases is that they do not execute the “then” clause of the decision.




The above flaw is rectified in the decision / condition criterion. This criterion dictates that each atomic condition in a decision is evaluated to all possible outcomes at least once and that each decision is evaluated to all possible outcomes at least once. However, this criterion also does not ensure complete coverage because of the fact that certain conditions mask other conditions. Figure 2 shows the machine code a typical compiler would generate for the above example.

Fig. 2.3: Machine code for the program in figure 2.2 [Myers 2004]

From the machine code control flow diagram (Fig. 2.3 ) we can easily deduce that in ‘H’ “ B = 0 ” would only be evaluated if “ A > 1 ” evaluates to true. Likewise, in ‘K’ “ X > 1” is only evaluated if in ‘J’ “ A = 2 ” evaluates to false. This demonstrates the need of a stronger test criterion that exercises all possible outcomes of each atomic decision. Multiple condition coverage criterion seeks to address the above drawback of decision / condition criterion. This criterion dictates that all possible combinations of condition outcomes in each decision are exercised (executed), and all points of entry to the code (subroutine) are accounted for as well. For the above example test data should meet the following conditions in order to be able to meet this criterion:

1. A>1, B=0 5. A=2, X>1 2. A>1, B<>0 6. A=2, X<=1 3. A<=1, B=0 7. A<>2, X>1 4. A<=1, B<>0 8. A<>2, X<=1

Multiple condition coverage ensures the coverage of decision-coverage, condition-coverage, and decision / condition coverage criteria. Path Coverage remedies the above failure by requiring that each possible paths of the control flow graph be exercised at least once during testing. A path through a program




code may be described as “the conjunction of predicates in relation to the software's input variables” [Sthamer 1995]. Some serious drawbacks of this approach are that programs containing loops may have infinite possible paths, and that it is not always possible to execute each path. Another approach designed to limit the number of possible paths needed to be traversed, proposes that the minimum number of distinct paths needed to be executed is equal to the cyclomatic complexity of the program, which is finite. By cyclomatic complexity it means the number of independent paths in the flow graph of that program.

Data Flow Based Testing In this approach there is a great consideration for the information about where variables are defined and where their definitions are used. It seeks to ensure the coverage of variables being used subsequently after they are defined. In other words, this approach tries to exercise the paths in which variables are defined and subsequently used. According to this approach variables can occur in the following instances: where they are defined (‘def’), where they are used in a computation (‘c-use’), and where they are used in predicates or conditions (‘p-use’). The ‘all-defs’ criterion states that every ‘def’ of each variables along with one of its use (‘p-use’ or ‘c-use’) are exercised by an execution path, i.e. test cases should contain such data that would result in the program control flow to traverse that path. Likewise, ‘all-p-uses’ criterion requires that all the p-uses of all the definitions are executed. Similarly, ‘all-c-uses, some-p-uses’ criterion requires that some p-uses may be exercised when all the c-uses of a variable definition are exercised. Obviously, ‘all-uses’ criterion would symbolize all p-uses and c-uses are exercised. There are various other criteria that may be easily derived that can test the execution of the definition and use of variables.

Mutation Testing This method takes a novel approach to testing which implements a form of error-based testing [DeMillo 1978]. If the system under test passes a prepared set of test cases then it could either mean that the system is flawless or that the test cases are not sensitive enough to detect the faults of the system. To nullify the latter possibility instead of testing the original program it executes slightly changed versions (version of the program with a single syntactically-correct fault introduced, called mutants) of the program with a prepared set of test cases, hoping to derive a different output from it than the original program. If the output is different from what the original program produces, the tests cases are deemed to be sensitive enough to detect failure (and the mutants are considered to be dead) and so if the system under test passes them, it should be considered fault free. However, if the outputs are similar then the number of mutants that would always produce the same results (equivalent mutants) is determined. The rest of the mutants should be killed (or distinguished) by test cases; hence additional test cases are added. It should be noted that this type of testing can only show the absence of the pre-specified faults [Morell 1990]. Moreover, the process of generating and testing a large number of mutants is very time consuming

Model Based Testing Often it has been found that the cause of ineffective testing and defective or missing functionality in systems is improper (inaccurate) specifications gathering. This is based on the view that poor specifications often result in majority of software defects [Becker 2002]. Model-based testing seeks to eliminate this very cause. The focus of




this approach to testing is to guarantee that the functionality dictated by the requirements of the proposed system are converted to sound (accurate and proper) specifications, which in turn are completely exercised during the testing effort. Test cases modeled on complete functional (represented by quality specifications) coverage would boost the confidence that the functionality has been successfully delivered in the produced system. The process is based on the assumption that rigorously modeled specifications would give rise to rigorous test scripts, guaranteeing a significant improvement in the functional integrity of the produced software system. It begins with the requirements team creating the specifications in the usual format which is translated into a graphical model of processing logic, outputs as well as inputs by the test-modeling specialist. This graphical model is an adapted form of Cause-Effect diagrams. These causes and effects represent business process components that the software executes. This graphical model can immediately identify inconsistencies and ambiguities in the requirements specifications by means of missing causes, missing effects or unclear interactions between them. These inconsistencies are documented by the test-modeling specialist as “specification ambiguities” and are reworked by the requirements analysis and design team. Comprehensive test scripts are generated from the model with the help of software when the test model is complete. Through this modeling process a rigorous examination of the modeling inconsistencies of the specifications is achieved. Thus, we find that the model based testing approach emphasizes functional completeness rather that the usual motive of software testing which is about “breaking” the code. However, this form of testing does not detect faults in coding structures that are execution path or sequence dependent.

Test Based Programming Approaches

Extreme Programming and Test Driven Development Extreme Programming (XP) is “a lightweight methodology for small-to medium sized teams developing software in the face of vague or rapidly changing requirements” [Beck 2004]. Its objective is to reduce project risk while increasing productivity of a software system maintaining a high responsiveness factor throughout the life of a system. It boasts of the ability to respond to the changing business needs by flexibly scheduling the implementation of the functionality. XP seeks to shorten the release cycle of software, so that there is less change during the development of a single release. This allows specification of the project to be continuously refined during development, so that learning by the customer and the team can be reflected in the software. Another basic fundamental of XP is that programmers (and customers) write automated tests to monitor the software’s development progress and to catch defects early, immediately after they have been introduced. This could be possible only because of the short iterative developments in the entire development process. No wonder, coupled with short development iterations, writing of tests is an essential criterion of XP. According to [Beck 2004], no bit of functionality should exist without an automated test written for it. If programmers (and customers) write Unit tests their confidence in the operation of the program would be endorsed by successful tests. The




tests give an opportunity to think about what is required from functionality independent of how it is implemented. The tests would confirm if the implementation is what was thought of being implemented. Also there is a gain in productivity as there would be a reduction in the time spent in debugging the code. Hence development is driven by tests and so this paradigm is often called a ‘test-first’ strategy. Programmers write tests first, then code and until all the tests run, a particular iteration is not complete. When all the tests run, and more tests that would break the code cannot be written, it marks the completion of the added functionality.

According to [Beck 2002], “By driving development with automated tests and then eliminating duplication, any developer can write reliable, bug-free code no matter what its level of complexity”. The foundations of Test Driven Development (TDD) are based on: 1) Coding only if an automated test has failed (because of lack of functionality or wrong implementation), and 2) Eliminating duplication. The programming tasks that are involved for these fundamentals to hold are:

1. Red- Writing small amounts of test; this initially must not work (imperatively). 2. Green- Quickly getting test to work by adding functionality. 3. Refactor- Eliminating all the redundancy introduced in merely getting the test

to work.

This enables programmers to learn quickly, communicate more clearly, and receive constructive feedback. For this to be possible, among several requirements, an essential requirement is to make testing simple by using designs that comprise many highly cohesive, loosely coupled components. Moreover, this practice reduces the frequency of nasty surprises enabling the involvement of real customers in daily development.

SpecSharp (Spec#) Programming System Spec# is a novel attempt towards creating means and environment to develop high-quality software in a cost-effective manner. It gives the programmer the capability to express how methods and data are to be used. These constraints expressed by the programmer are enforced by the compiler using run-time checks to ensure proper usage of the methods and data according to the conditions laid down by the programmer. Moreover, a verifier is also made available that checks the consistency of a program with its specifications. This may look similar to the concept of the programming language Eiffel [Meyer 1998], which dynamically checked the correctness of the programs executed by using embedded specifications as run-time checks. However, Spec# is a concentrated effort to ensure the provision of such an expressive framework that none of the possible assumptions of a programmer are left unspecified, nor are they forced to be implicit. Hence, it explicitly provides a verification mechanism. The Spec# programming system comprises the object-oriented Spec# programming language (an extension of C#), the Spec# compiler, and the Boogie static program verifier [Barnett 2004]. The Spec# programming language includes specification constructs like non-null types, pre and post conditions (part of method contracts), support for constraining data fields and comprehensive exception management. These specifications form a part of the executable code of a program, and are distinguished by special form of tags from the actual code. A method contract symbolizes a setup in




which the client (code invoking a method) satisfies the pre-conditions of a method (code invoked) when invoking it, and the supplier (code invoked) ensures certain (post) conditions after execution. Spec# provides non-null types for use in specifying fields, local variables, formal parameters, and return types to be non-null. It then creates necessary constraints and run-time checks to ensure the specified non-nullity. It also permits specification and reasoning of object invariants, i.e. it allows the programmers to specify the data or properties that remain the same after any operation. Another advantage of the expressive power of this programming system is that it makes the code of the implementation more readable and comprehendible. The specifications are stored separately as well as a compiled unit in a language independent format which could be used by other analysis and verification tools for further checks. The Spec# compiler employs static-analysis techniques and issues run-time checks to guarantee the soundness of the code. Combined with a verifier to verify the consistency of a program with its specifications, this programming system does really provide a practical tool that enables programmers to express and validate their specifications.

Automatic Test Data Generation It is claimed in [Staknis 1990] that an automation of the testing process can lead to an extensive testing of an application. Moreover, automation of tests alleviates time, effort, labour and cost factors involved in software testing. Generally, any measure of automation comprises an instrumentor, a test data generator and a test harness. An instrumentor is used to instrument (adding of instructions or code in a suitable format to support a future action) the program code under test. A test harness is an application that automates the testing of a system’s (or program’s) core functionality. This automation testing can be achieved by two processes of program execution- static or dynamic. A testing tool which employs static analysis manually or automatically analyzes the program code under test without executing the code. The capability of static analysis is limited to an extent (unless appropriate measures are taken, as in [Visser 2004]) when it comes to handling program code containing array references, pointer variables and other dynamic constructs. Experiments have shown that static analysis can effectively detect 30% to 70% of the logic design and coding faults [DeMillo 1978] of a typical program application. An automated test data generator [Visser 2004], based on symbolic execution, is discussed later. On the contrary, dynamic analysis tools execute the program under test and generate test data based on the feedback received from the execution. Automated testing tools can be loosely categorized into three kinds: Test procedure generators, code coverage analyzer and test data generators [Dustin 2003]. Test procedure generators process requirements information to create test procedures by statistic, algorithmic and heuristic means. In statistical test procedure the tool chooses test input data based on either the usage profile of the software under test or some other statistical analysis. Heuristic or failure based generations employ the knowledge of historically frequent failures to generate test data. Alternatively strategies based on data, logic, event, and state can be utilised. Each of these strategies is employed to detect different kind of software defects. Code coverage analyzers measure different levels of test coverage, for example, branch, condition, segment coverage etc. Test-




data generators automatically generate test data based on a set of rules which could encapsulate functional testing, data-driven load testing, or performance and stress testing. Basically, there are three types of test data generators: path-wise generator, data specification generator and random test data generator [Sthamer 1995]. These generators generate test data which can be run on the program under test, the output from which is compared with the expected ones in order to detect errors in the program. A path-wise generator generates test data which facilitates white-box testing based on criteria like path coverage, statement coverage, branch coverage, etc. Generally, this kind of a testing application comprises flow graph construction, path selection and test data generation modules. An example of this kind of a system is the one presented by [Korel 1990]. It employs a dynamic test data generations approach in which values of input variables are computed automatically which cause a selected path in the program code to be executed. As this calculation is done while executing the program code dynamic values of array indexes and pointers can be determined easily thereby overcoming some of the constraints of static (e.g. symbolic) execution. Data specification generators generate test data which facilitates black-box testing based on some kind of formal system specification. An example of this is a system based on formal Z specification [Yang 1995]. It is evident that the need of a formal specification is the downside of this story [Gutjahr 1993]. Random test data generators generate arbitrary values from a uniform distribution as test input data. Statistical testing, a technique by which test data from an expected usage distribution profile is selected, is the suggested approach of a random test generator in [Ould 1991]. Although [Hamlet 1987] suggests that it is difficult to choose data values from most likely part of the input domain and also that the operational distribution for a test program may not be available, he still recommends the statistical approach. Contradictory comments by [Deason 1991] and [Myers 2004] suggest that random testing seldom provides an effective coverage of the program under test. However, [Duran 1984] states that random test generation is more cost effective than partition testing based test data generation (which is the approach taken in my system- EasyTest tool) because of solely relying on a random number generator with minor software support.

Symbolic Execution It refers to the practice of executing a program with symbolic values, rather than actual concrete values [Walkinshaw 2006]. As a result, its internal variables are expressed and manipulated as symbolic expressions and hence, the output is represented as a function of symbolic inputs that replaced actual inputs. Generally, a state of a symbolically executed program is represented by symbolic values of the internal variables, a program counter and a path condition [Khurshid 2003]. Program counter defines the next statement to be executed. Path condition is an accumulated set of constraints, expressed in a Boolean formula over symbolic inputs, which the input values necessarily satisfy in order for the program control to flow through the particular path associated with it.




Fig. 2.4: Code that swaps two integers and the corresponding symbolic execution tree,

where transitions are labeled with program control points

All the symbolically executed paths of a program can be represented in a tree structure in which the nodes represent program states, the arcs represent transitions between states and the leaves represent the termination of a potential path through a program. This tree structure is known as a symbolic execution tree. Figure 2.4 shows an example of a symbolic execution tree for a particular program code [Khurshid 2003]. The program code fragment on the left in Figure 2.4, swaps two integer variables, when one of them is greater than the other. The corresponding symbolic execution tree is shown on the right. Program Counter is set to true at the beginning and x and y have symbolic values X and Y, respectively. In order to choose between alternative paths the Program Counter is updated with corresponding branch condition satisfied by the inputs at every decision point. To elucidate, the first statement of the program code can be true as well as false depending upon the values of x and y, so there would both a true as well as a false branch. Hence the Program Counter is updated accordingly with the branching conditions. Path condition having a value false signifies that there is no set of inputs that satisfy it, which means that it is not a reachable symbolic state, thereby terminating symbolic execution for that path. This means that statement (6) in the program code is unreachable. Some likely pitfalls (if not addressed adequately) of symbolic execution are as follows: - It is computationally expensive to manipulate algebraic expressions, more pronounced when performed on large number of paths. - It is difficult to handle variable dependent loop conditions, module invocations, pointers and input variable dependent arrays [Korel 1990]. - The above problems coupled with the existence of multiple constraints slow down its successful application [Gallagher 1993].

Model Checking Model checking is a formal method of automatically verifying models of systems to determine whether they satisfy their formal requirements. If they do not, then the




model checker generates counter-examples of a system satisfying the specifications. The need for a model checker arises from the fact that in order to ensure a system meets its requirements perfectly it has to be tested in different environments, which to some extent is impossible practically. However, a model checker traverses the model state-space exhaustively and determines if the specifications, which sometimes are expressed in temporal logic formulas, hold in every possible state of the model; thereby simulating testing of the system in every possible environments.

In case the model does not match its specifications (i.e. negative answer), the "bad" states of the model, which do not satisfy any requirement can be followed to determine the flaw in the model. This helps in correcting the model. The drawback of this verification methodology is the very large (often of astronomical value) number of states discovered combinatorially by the model check. This problem of model checking is called state explosion problem.

Fig. 2.5: Schema of the model checking method. [Barnat 2002]

This problem can be demonstrated by considering a simple example from [Barnat 2002]: Let us suppose there are four queues of binary data each having a capacity of 64 bytes. They can be modeled as four byte fields each having a size of 64. The number of states this kind of data structure can reach can be calculated as follows: Each byte can represent 256 values. Each queue has 64 varying values of the type byte. Thus, one queue can acquire 25664 different values. Consequently, four queues can take 256256 different states. It is almost impossible to do the whole state space traversal, at least mechanically. Model checking [Clarke 1999] has become increasingly popular over the last two decades. Its recent application to the analysis of software programs has also been successful to an extent [Henzinger 2003]. It is perceivably difficult to model check programs due to the complexity of the code. More often an exhaustive analysis of a program’s state space is not achieved since it runs out of memory. However, some of the approaches proposed to combat this problem are: Symbolic algorithm, partial order reduction, etc. Often model checker based programs rely on (predicate) abstractions [Henzinger 2003] to reduce the size of the state space; however, such techniques do




not facilitate handling of code that manipulates complex data — use too many predicates, making the abstraction process inefficient [Visser 2004]. A model checker provides some astonishing features as built-in capabilities, such as backtracking and a number of search capabilities (e.g. heuristic search), including, surprisingly, some techniques that may help combat state-explosion (e.g. partial order and symmetry).

Related Work

Test Input Generation with Java PathFinder [Visser 2004]: In this work, the authors present a system for generating test (input) data for program code capable of manipulating complex data structures. They use Symbolic Execution method coupled with Model Checking to generate efficient tests in order to obtain high structural coverage of code. To demonstrate their system’s capabilities they apply it to test for the branch-coverage of some of the methods of the Java TreeMap library during unit testing, using Java PathFinder model checker. Their framework serves a number of purposes; however, in this context we would consider the test input generation feature. The model checker generates paths that satisfy a testing criterion, along with corresponding constraints on reference fields, path conditions and thread scheduling information (in case of concurrent components of a program). This shows that this system handles concurrency, utilizing the model checker’s power to analyze thread interleaving. The clause that a specific coverage type is not achievable can be encoded as a set of (temporal) properties. The model checker will find counter-examples for the clause, if they exist, which on being transformed into test inputs can accomplish the stated coverage goal. From as early as the seventies Symbolic Execution has been demonstrated as a methodology for generating efficient test input data [King 1976], however most of the earlier works revolved around generating test data for simple data types. This system, on the contrary, presents a powerful unit testing framework that generates test input data based on the symbolic execution of program code that processes complex data structures. It can be applied for both white box (structural) as well as black box (specification) testing. As mentioned before, the methods under test are model checked to generate test data meeting a particular testing criterion. This testing criterion is represented as properties the model checker should verify, and thereby produce its counterexamples as paths that satisfy the coverage criterion. During model checking symbolic execution is carried out which produces a set of constraints over the input symbols that must be satisfied in order for the corresponding paths to be executed. Hence, the actual testing would only comprise solving the input constraints to instantiate test data inputs that can be used to execute the paths. It should be kept in mind that for a model checker to support symbolic execution the source code under test is instrumented, i.e., necessary additions are made to the code that act as instructions, information or conditions to induce the capacity to manipulate formulae representing path conditions. This instrumented code is checked by the model checker using its usual state space exploration techniques.




It was found that only 11 input tests were required in this white-box testing approach to attain the same coverage as 84 input tests could achieve for a black-box approach. It should be noted that this difference would increase if more complicated structural inputs are part of the program code under test. However, this system uses complex method preconditions to initialize method arguments and fields only with valid values, and complex method post conditions as test oracles to verify a method’s correctness. These preconditions facilitated efficient symbolic execution of the complex data manipulating code, thereby generating high code coverage test input data. It seems to be a drawback because of the fact that they are hand-created and there are no means of generating them automatically.

A Framework for Generating Object-Oriented Unit Tests [Symstra 2005]:

According to the authors, the tasks involving the generating of unit tests are invoking a sequence of method calls and feeding them with relevant method arguments. They have presented a system that simulates both these test generation tasks employing symbolic execution of method sequences and feeding them with symbolic arguments. Each symbolic argument represents the set of all possible concrete values which can be used as arguments. In order to be able to do so their system systematically explores the object state space of the program code and selectively prunes (discards) those states that are subsumed by previously explored states. It is claimed that the system generates unit tests that attain a high degree of branch coverage as well as intra-method path coverage for program code that manipulates complex data structures. It also claims having an edge over the system by [Visser 2004] in not requiring the provision of the additional methods to create preconditions. Object oriented units tests generally test program class code by a fixed sequence of method invocations, feeding them with fixed concrete values as arguments to verify a particular aspect of its behavior. Symstra uses symbolic values instead of concrete values for primitive-type arguments in method invocations. This is illustrated in figure 2.6. As is usual with any symbolic execution based system, Symstra also operates on symbolic states that consist of two elements: (1) Path condition < C >, which must hold for the execution of the path and (2) a heap < H >, which contains symbolic variables. When the symbolic execution encounters a condition statement, it explores both outcomes, appropriately adding the branch condition or its negation to the constraint for the corresponding paths.




Fig. 2.6: a) Example program code, b) Typical method invocations sequence, and c) Symstra’s method symbolic invocations

In order to be able to actually test the program code under test, Symstra has to create a set of concrete tests at the time of symbolic state exploration that lead to the explored program states. In order to do so after the symbolic execution of a method which leads to a new symbolic state <C, H>, it generates symbolic tests consisting of the constraint (path condition) C and the shortest sequence of methods that reaches that state <C, H>. In other words, it associates a methods sequence with each new symbolic state. It then instantiates a symbolic test, by solving the constraint C (over the symbolic arguments) for the methods in the sequence, with the help of a constraint solver called POOC [POOC 2002]. Concrete arguments are used in these solved equations thereby creating a complete set of tests. These concrete test sequences are exported into JUnit test classes [JUnit 2003] and comments are made in the created classes out of the constraints C associated with the tests. Sysmtra instruments each branching point of the tested program class so as to be able to measure its branch coverage at the bytecode level. It also catches uncaught run time exceptions by adequately instrument each method of the tested class. The authors experimentally demonstrate that this system generates unit tests that attain higher branch coverage, faster than the other existing test-generation systems based on concrete argument method invocation. Moreover, it can achieve better branch coverage than other existing techniques.

Search-based Software Test Data Generation We are aware that often exhaustive testing of any reasonably sized program may be infeasible, and at the same time random methods may be highly unreliable and less-likely to test ‘important’ features of the program under test. In the latter case, at least there is a possibility that those features may not be exercised by a mere chance. Add to that the limitations imposed by the size and complexity of the software involved in testing. Solution to these problems is claimed by meta-heuristic search techniques based test data generation approach [McMinn 2004]. Meta-heuristic search techniques apply heuristics to search or discover solutions to combinatorial problems at a computationally reasonable cost. Generally, a heuristic based approach to solving problems employs techniques to find, discover and learn solutions without considering whether the solution can be proven. It usually provides computational performance gain and conceptual simplicity rather than accuracy of results. Commonly, meta-heuristics are targeted to combinatorial optimization problems and as software testing can be expressed as a combinatorial problem it is aptly suited for the application of meta-heuristics. By the way, combinational optimization algorithms are used to solve problems that are believed to be hard in general, which is accomplished by exploring of the usually large solution space of the problems. In order to apply the meta-heuristic techniques to test data generation the test criteria is transformed to an objective function. These objective functions evaluate the solutions of the search against the overall goal of the search. Using the results of this evaluation as a feedback the search is directed to towards a more optimal solution in the search space. Solutions are encoded in a way such that they can be manipulated by the search function.




According to [McMinn 2004] meta- heuristic search techniques have been successfully applied to the following areas of automated test data generation:- 1) Structural coverage, 2) Behavioral coverage, 3) To verify non-functional properties (e.g. worst case execution time of a code segment ) etc. Some meta-heuristic techniques that have been predominantly used in software test data generation, namely Hill Climbing, Simulated Annealing and Evolutionary Algorithms are discussed briefly.

Hill Climbing It is a simple search technique analogous to the action of “climbing of hills in the landscape”. A random solution is chosen initially from the solution search space (called the starting point) and it’s neighbouring (close in value) solutions are investigated. If any of these solutions score better evaluated against the objective functions it is selected; again the newly selected solution’s neighbourhood is investigated similarly to find a better solution and this goes on until no further improvement can be found for the current selected solution. Visually, this looks like climbing to the peaks (best solution or locally best solution) from the lower areas (inferior solutions) in a landscape. When the selection criterion mandates the selection of the best solution from among the neighbourhood solutions its strategy is said to be “steepest ascent” where as if the selection criterion selects the first solution found better than the currently processed solution then it is said to have a “random ascent” strategy. Although hill climbing is a simplistic approach, it may invariably yield a sub-optimal result by finding a solution with a local optimal value of objective function, instead of the globally (in the entire solution search space) optimal solution. This happens at ‘local peak of the landscape’ as all the neighbouring values are inferior solution, despite the fact that there are higher peaks in the entire landscape of the search space. This may also happen on a ‘plateau’ (an area of search space with similar value of objective function) where neighbouring solutions are comparably same. Thus, we find that this technique is limited in effectiveness by the quality of the starting solution. A possible solution to this apparent drawback is that the process is successively re-run with different initial solutions in order to be able to sample more of the search space. Figure 2.7 aptly describes the algorithm of this approach [McMinn 2004].

Fig. 2.7: High level description of a hill climbing algorithm, for a problem with solution space ‘S’; neighbourhood structure ‘N’; and ‘obj’, the objective function to be maximized




Fig. 2.8: High level description of a Simulated Annealing algorithm, for a problem with solution space ‘S’; neighbourhood structure ‘N’; ‘num_solns’ the number of solutions to consider at each temperature level ‘t’; and ‘obj’, the objective function to be minimized

Simulated Annealing This is an approach that seeks to make the search of an optimal solution less dependent than Hill Climbing on the starting solution, although being principally similar. The only difference is that it probabilistically selects solutions that are inferior solutions rather than the local optimum (best among a neighbourhood of solutions) solution. The probability ‘p’ of selecting of an inferior solution varies as the search progresses. It is calculated as:

where the symbol ‘delta’ is the difference in objective value between the current solution and the neighbouring inferior solution being considered, and ‘t’ is a control parameter known as the temperature. This process is controlled in such a way that initially the probability of selecting an inferior solution is higher (keeping‘t’ higher) and then it is gradually decreased (decreasing the value of ‘t’). This is to facilitate the exploration of search space more freely in the beginning, thereby reducing the possibility of the search to get stuck with a local optimum solution. Sample of an algorithm that minimizes the objective function for finding optimal solutions is shown in fig 2.8. [McMinn 2004].

Evolutionary Algorithm Search techniques and adaptive procedures based on the Darwin’s renowned theory of biological evolution and employing processes similar to that of natural genetics are classified as Evolutionary Algorithms [Wegener 2001]. These algorithms simulate




‘evolution’ in order to evolve candidate solutions using a mechanism emulating genetics and ‘natural selection’ [McMinn 2004]. They process a number of potential solutions in parallel in an iterative fashion to gradually give rise to solutions having a better combination of those parameters that significantly affect the overall performance of a solution. To achieve this each individual solution is encoded with the permissible values of the optimization problem. A generic description of the evolutionary algorithms follows after which its adaptation to software test data generation is discussed. An evolutionary algorithm begins with randomly selecting a few supposedly good individual solution and then works to attain an optimum solution by recombining ‘fit’ solutions and introducing in the new solution a probability of independent random change (mutation). Hence, we find that basically the evolutionary algorithms applied to software engineering comprise a ‘selection’ and a ‘reinsertion’ mechanism. The selection mechanism determines the individual solutions (parent) that would ‘reproduce’, i.e, the individual solutions that would be recombined so as to produce new solutions (offspring), based on the individual ‘fitness’ values (discussed later). The reinsertion mechanism determines which individual solutions from the parent and offspring population would be recombined to form the next generation of solutions. As is true for any search based approach, evolutionary algorithms have a measurement system that evaluates the value (worth or feasibility) of an individual solution. A ‘fitness’ function (analogous to objective function) determines the performance of an individual solution with regard to the current optimum so as to introduce comparability of solutions and expresses their performances in numerical values. Based on their fitness values and a predefined selection strategy, pairs of individual solutions are selected from among the parent (and offspring) population and combined in some way to produce a new generation of solutions, thereby emulating the biological reproduction mechanism. Figure 2 provides an overview of the typical process of an evolutionary algorithm [Wegener 2001].

Fig. 2.9: Evolutionary Algorithm




To adapt evolutionary algorithms to automate the generation of test data for software testing, the test goal (test criterion) has to be transformed into an optimization task. The ‘fitness’ of candidate solutions is evaluate using the fitness function that is derived from the numeric representation of the test goal. From a randomly generated initial population of solutions (each representing a test datum for the program under test) member solutions are selected (according to their fitness values) and are subjected to combination and mutation to reproduce offspring solutions. The respective fitness values of all the solutions are calculated by monitoring and assessing the execution of the solution (test datum) on the program code under test. Again a new generation is produced by combining the newly created offspring and their parent solutions according to the survival strategies (fitness function) and this continues in a cycle until the test objective is fulfilled. It should be noted that the occurrence of flags in branching conditions or states in the program under test hinder the searching operation.

Genetic Algorithm Genetic Algorithms are probably the most well known form of Evolutionary Algorithm [McMinn 2004]. This process is more closely modeled on the lines of the natural system of biological creatures, in which new generation of solutions are ‘bred’ and ‘raised’ so as to ‘reproduce’ themselves. This approach derives its name from the analogy between the structure of a candidate solution and that of the genetic structure of the chromosomes. Hence solutions are often referred to as chromosomes by virtue of being encoded as a sequence of simple components (often referred to as ‘gene’), analogous to encoding genes in chromosomes. Often the actual encoded structure of a candidate solution is a simple string of binary digits. This could be of the form <01110000, 11111111, 00110100> to represent a vector of three integers <112, 255, 52> in the range [0, 255] [McMinn 2004]. Since Genetic Algorithms process a population of solutions to ‘breed’ new generation of solutions they ensure that the search space is sampled more extensively thereby eliminating the problem of local optimum solutions. The overall process comprises two basic steps: Selection and Reproduction. In the selection phase, member solutions (chromosomes) are either randomly chosen or according to their fitness values. During the reproduction phase, a member couple is chosen and subjected to genetic operators- crossover and mutation to produce a new generation of a couple (offspring). The crossover operator couples the genes of two solutions (parent chromosomes) to generate two similar solutions (offspring chromosomes) by swapping corresponding substrings (a group of genes in chromosomes) of the parent chromosomes. This simulates the act of combining genes (genetic materials) of fitter parents to produce new ones of biological creatures. The mutation operator seeks to introduce a degree of diversity among the parents and their offspring by altering (with a low probability) one or more genetic cells (genes) of the solution produced by the crossover function. Moreover, the mutation operator helps reduce the likelihood of generating offspring with value stagnant at a local optimum. A predefined survival strategy determines which parent and offspring solutions ‘survive’ for the next cycle of breeding. This cycle of ‘breeding’ and ‘raising’ continues till a global optimum solution is found. Astute readers would be able to mark the downside of this approach- excessive number of iterations may be required which leads to a large amount of computational time. In order to illustrate the above description, let us consider a small example taken from [McMinn 2004]. As mentioned before, the parent solutions (chromosomes) are




recombined and mutated to evolve successive generations of offspring solutions. In ‘one-point’ recombination, a single cross-over point is chosen which indicates the position of the genes where a swap of genes would take place among the selected parent couple to give rise two offspring. The result of the crossover operator on two individuals <0, 255, 0> (000000001111111100000000) and <255, 0, 255> (111111110000000011111111), with a single-point crossover at gene position 12, would be as follows:

This produces two offspring solutions- <0, 240, 255> and <255, 15, 0>. In the selection phase, member couple that would form the parent in the reproduction phase can be selected using a number of selection mechanisms. One such mechanism based on the fitness of the parent solution (fitness can be calculated from the objective function) favours the selection of fitter individuals which, hopefully, would give rise to fitter offspring. However, this may result in the dominance of the fitter solutions in the next generation of solutions (offspring), thereby reducing diversity and increasing the likelihood of premature convergence limited to one area of the solution search space. Conversely, an inverse strategy will result in excessive exploration and decline in the number of evolutions for the search to make substantial progress. The conceptual approach of Genetic Algorithm is illustrated as a block diagram in Figure 2., the associated pseudo code is displayed alongside where ‘t’ is the generation number and ‘P’ is the solution space (population) [Sthamer 1995].

Figure 2.10: a) Block diagram of GA. b) Pseudo code of GA.

Related Work Some of the related works in the field of application of evolutionary (or genetic) algorithm to the automation of test data generation are [Sthamer 1995], [Roper 1997], [Weichselbaum 1998], [Wegener 2001], etc. In [Roper 1997] and [Weichselbaum 1998] the fitness function (similar to objective function described above) calculates the




fitness of a test data based on the coverage of the program code, higher the coverage fitter being the test data. Both test for statement and branch coverage, while the latter additionally tests for condition coverage. A drawback pointed out by [Wegener 2001] in this kind of coverage criteria for fitness calculation is that the search is usually adversely directed at finding test data that execute few long, easily accessible program execution path, encumbering complete coverage. In [Sthamer 1995] and [Wegener 2001] automation of the statement and branch coverage testing is emphasized. The testing process is divided into partial aims, each representing a program structure that must be executed to achieve full coverage of a particular coverage criterion, e.g. a statement, a branch or a condition. For each partial aim an individual fitness function is devised and a corresponding search for test datum that executes that partial aim is optimized separately. Actually, this fitness function is dynamically derived by an analysis of the predicates of the branching conditions of the program code under test. The objective of the optimization problem is to minimize the distance of each individual test datum from executing the program conditions in the desired way. The set of test data determined by the optimization function tests the coverage of the structural test criterion represented by the partial aim. This approach supposedly yields a better coverage than the previously described coverage oriented approach.

Evolutionary Testing Environment for Automatic Structural Testing [Wegener 2001]:

This work has produced a test environment that supports all common control-flow and data-flow based test criterion together in one system. Figure 2 gives an overview of this environment. In order to facilitate this, the structural test criteria are divided into four categories based on the required test purpose and the control flow graph (a graph showing the execution flow as branches and program code components, e.g. statements, as nodes):

1. Path-oriented methods for path coverage. 2. Node-oriented methods for statement and condition coverage, 3. Node-node oriented methods for ‘all-defs’ and ‘all-uses’ coverage, and 4. Node-path oriented methods for branch, segment and LCSAJ (Linear Code

Sequence and Jump) coverage. For each category, the set of all structural tests is divided into partial aims and fitness functions for partial aims are derived in the same manner. Each of these partial aims corresponding to a test criterion must be executed in order for the coverage of that criterion to be accomplished. As mentioned before that the fitness function is dynamically devised depending on the predicates of the branching conditions and an approximation level is calculated that indicates the number of branching conditions (or branching condition nodes in control flow graph) that are still required to be executed in order to attain the required partial aim. Partial aims for node-oriented methods are derived from the nodes of the control-flow graph with the objective to find a test data set that would execute every desired node of the control flow graph. Partial aims for the path-oriented methods are derived from the paths in the control flow graph that must be executed in order to fulfill the chosen structural test criterion. Partial aims for node-path oriented methods are derived from the nodes with outgoing edges in the control flow graph (branch coverage) or from the initial nodes with path to a jump statement of a linear code sequence. Partial aims for




node-node oriented methods are derived from the paths connecting the nodes under consideration in the control flow graph.

Figure 2.11: Components of the evolutionary test environment.

It was experimentally demonstrated that evolutionary based testing had to generate much less test data in order to achieve a very high degree of all test criterion coverage as compared to random-based test data generators.

Automatic Generation of Software Test Data Using Genetic Algorithms [Sthamer 1995]:

This research demonstrates the effectiveness of the Genetic Algorithms to generate test data. Test criteria including branch coverage, boundary coverage and loop coverage (testing multiple iterations of loop) were successfully applied (or tested) to test program code using a test harness. The main advantage of this approach is that the relationship between conditional predicates and the input variables need not be specified and may be of any complexity as it is based on dynamic fitness function generation (as discussed earlier). Experiments showed that test data generated by this approach distinguished mutants from the original code all the time, thereby aggregating a mutation score 100%. Experiments also revealed the effectiveness of this approach to at generating boundary value tests. Overall, this approach based on genetic algorithms was shown to require fewer (up to two orders of magnitude) test generation (less computational time) than random testing to attain a high coverage.

Conclusion The literature review provided in this chapter attempts to gives a brief insight into the different approaches to software testing and the various techniques commonly employed in the software industry. Another reason is that they were considered for the purpose of the system to be developed. The method selected for the system is a hybrid form of equivalence partitioning, category partition and boundary value analysis. This kind of technique was adopted among others because of a number of reasons. The purpose of this system is to inculcate among the fresh undergraduate students an idea about testing; an idea that can explain the purpose, utility and basic techniques of detecting faults in a software program. The system is also expected to acquaint and




promote the test driven nature of today’s software industry among the new followers of this industry. This makes it mandatory to start writing tests before implementing which means we have the specifications but not the code. In order to be able to write tests a specification-based approach is necessary, thus the purpose behind the selection. Model based approach needs a formal modelling language which many new comers do not know. In order to remove such an obstacle, model based approach was not favoured. The purpose of reviewing automatic test data generators is the fact that this implemented system works from the specifications and goes right up to generating test data. This would help learners understand the details of the approaches applied to detect errors in the system. This system is modelled on the lines of the system presented by [Deason 1991] which employs rule-based testing methods on integers and real variables. This system further treats date and string variables. However, in [Deason 1991] test data generator assigns constant values to variables with and then incremented and decremented till they reach either boundaries of the range of values accepted by the variables. Moreover, random input testing is also carried out. In this system representative values from equivalence partitions as well as strategic boundary values are considered to generate test data. Deason called his approach multiple-condition boundary coverage which was shown to always out-perform random testing. He concludes that it is impossible to generate test data capable of executing all branches of software using this strategy. Another drawback of this approach is that it does not take into account errors that may be caused by data not covered by the rules. However, the simplicity of the approach and its effectiveness in providing a perspective of software testing made it the chosen approach.

Chapter 3. Requirements & Analysis

Department of Computer Science,


30

Chapter 3: Requirements & Analysis As highlighted before, there is a need for a concentrated effort in the first programming course at the undergraduate level to borne the seeds of testing in the hearts of students. More so because it is a time when concepts begin to shape in their minds and when students’ thought process is relatively easier to mould into one of the forms: one that readily accepts testing as an integrated part of development, or the other that feels it is something that is just an additional burden. The absence of any great effort to address this issue leads to the inhibitions that students develop at the initial stages, which becomes incredibly hard to overcome when it comes to testing a piece of code written, let alone the ‘test-first’ methodology, i.e. writing tests before coding.

Primary Requirements This project was conceived based on the need for some demonstration and practical test tool that would not only guide the students into writing tests for their code, but also practically demonstrate the writing of elementary tests for programs, in order for students to analyse them. This analysis by students would help them twofold: first, by learning how to write elementary tests and, second, by knowing how to write a fault free code when they know what are the usual faults testing methods seek to detect. Thus, the formal requirements of the system may be stated as:

To create a learning tool to inculcate basic idea of testing (i.e. detecting errors) to students of the first programming course in Java.

To create a practical test tool to acquaint with basic specification based testing methods

To provide focus on a ‘Learn - N - Do’ Test approach, from thinking about simple program code (usually a class) to creating tests for them.

To promote the prevalent Test Driven Development ideology, commonly advocated by Agile methods of software development (esp. XP).

Secondary Requirements The proposed system may be implemented as an on-line testing support tool which

not only provides a tutorial on the craft of testing, its importance and the techniques one can apply to test the Java code; but also an extended support tool that can generate tests for such Java code.

The proposed system should not incur upon the novice programmers the burden of

having to learn any formal specification language so as to work on the system.

The proposed system should help novice programmers gain an insight into the common faults such that they can write less faulty code.




31

Requirement Analysis Here, a number of factors are needed to be considered which determine the capability and the complexity of the proposed system: In the first view of the system, i.e. system developed as an online application, the system could be a server application which provides the learning materials as static or dynamic html page (JSP). The testing mechanism could be implemented using Servlets where the specifications are pasted on a text area (of JSP) or entered in a form which would be extracted and tests could be developed. The other way could be loading an attached specification file and developing test for the same. Few things that need to be considered here are what should be the format of specifications and in which IDE or environment it should be processed. The other factor is the complexity of the code which are to be dealt with. After a little research on the standard of the Java code that the students of the Crossover Project (COM1030 - Requirements Engineering , COM1040 - Systems Design and Testing) usually create, it was found that they have a fair degree of complexity. One such example is the Library System project which is the model on which the course lectures are based. It is a moderately complex system that implement a number of features of the Java programming language such as, collections, file handling, etc. Taking this into account we should also consider: How complex code can be submitted? Which methods: equivalence partitioning, boundary value analysis, etc, could be practically implemented? The other view of the system is that it may be implemented as an Eclipse plugin or may be a standalone application (coded in Eclipse), so we get the IDE in which to operate. We already have a testing mechanism in the form of Junit framework, in which the user writes the test and then provides input data to the tests and the expected expected results to match against the actual results, and so the task of creating tests as well as guessing the test data that may crash the system is the responsibility of the user. Hence, we should consider whether this system may automatically generate the test cases (or test data) that are likely to crash the system and thus expose the faults of the system. The following are some of the assumptions and adjustments that can be made in order to address issues disscussed above:

The first programming course is COM1010- Introduction to Programming, where basic Java skills are imparted. Although COM1030, COM1040 are first year modules they are advanced courses conducted later in the first year. This system should be able to process programs at this level.

A prototype would be delivered which is a standalone application (in Eclipse)

capable of accepting the specifications of the program to be tested. Only those test features would be included in the system, which can be explained

and easily understood by the first year students. It would start with the specifications and build on specification based testing measures- equivalence partitioning, boundary value analysis and category-partition.




32

Test cases would be generated automatically using the above chosen methods. These test cases practically demonstrate the output of these methods, thereby putting their test procedure into perspective.

Now, let us consider how much of a complexity would the program manage in order for it to be suitable for students of module COM1010.A review of what students understand during the module COM1010: • The terms class, object, instance and method. • The notion of a variable - a box in memory which contains a value of a specified type. • Input and output using the Sheffield package. • How to manipulate a String of characters. • How to select from a number of alternatives using if, else, switch and case. • How to perform repetition using a counting loop (for) or a conditional loop (do, while). • How to declare an array, which is a collection of data. • How to declare a Java class, provide a constructor and write get and set methods. • How to chain constructors to produce more compact and maintainable program. Understanding nuances of assigning references and passing values and their scope. • How to create objects containing reference to further objects, pass object references as parameters and return them as the results of methods. • How to implement classes representing abstract data types, providing a test harness. • How to write classes with re-use in mind. The following elementary program is taken as an example from the lecture slides of COM1010, which computes the area and circumference of a circle given the radius: public class Circle { private double x, y; // the circle centre private double radius; // the radius public Circle(double x, double y, double r) { this.x = x; this.y = y; radius = r; } public void setRadius(double r) { radius = r; } public double circumference() { return 2.0 * Math.PI * radius; } public double area() { return Math.PI * radius * radius; } public static void main (String[] args) {




33

Circle c = new Circle(0.0, 0.0, 2.0); c.setRadius(10.0); System.out.println("main method in class Circle"); System.out.println("Circum: " + c.circumference()); System.out.println("Area: " + c.area()); } }

Listing 3.1: A program to calculate radius and area of a circle If we are to generate the test cases we have to first create the test frames, setup test data and then combine them. The following is one solution based on the Category-Partition method, which generates test frames: In order to generate the test frames we have to first find out each functionality (or functional units) of the program, determine the parameters of each function (inputs / outputs / environmental conditions), identify the categories of each parameter, and partition the categories into equivalence classes of test cases. We then can select one partition from each category and put them in groups that would represent the test frames. We may apply constraints to the partitions in order to determine if the test case created would be a ‘pass’ test case or a ‘fail’ test case. Of course, for symbolic partitions valid test data would need to be set up to convert the test frame into a test case. The task of identifying the functionality of a program, determine the parameters of the functions and then the categories automatically seems to be impossible without user interaction. The problem with such an approach is that the logic to determine the above elements (bold words) automatically seems to be quite a complex process. Moreover, these elementary programs do not have the scope to define the various parameters that can determine the categories and partitions fairly exhaustively. Some more points to consider: Can a program to be tested not be a self contained Java class, i.e, having dependencies on external classes? Will the classes have multiple methods or will there be a single method to test? The following are some of the assumptions and adjustments that can be made in order to address issues disscussed above:

The program for which to generate tests would be assumed to have a single functionality. This would help eliminate the complexity of multiple functionalities and having to deal with them. The users will be made to identify the parameters as part of the specifications of the program for which to generate tests.

The program specifications will be accepted from a user interface where some info

like the no. of parameters required and types of parameters can be entered. Then on separate forms, the specific (or range of) values for numeric parameters; content, length and case of content for alphanumeric parameters; and format for date-type parameters will be accepted. Based on the range, format, case and length of values (and other constraints) equivalent partitions can be created using equivalent partitioning and boundary value analysis. One partition from each




34

category of each parameter can be selected to derive test frames. Some representative values (test data) for symbolic partitions can be derived from an oracle for these frames to qualify as a test data.

On the other hand, in order to demonstrate the writing of tests for a program that

contains a method to compute an expression (formulae), the expression can be accepted and wrapped in a JUnit test with pertinent assertions. This would exemplify the process of writing tests for any program. The users can run the JUnit tests independently by feeding some values as parameters of the expression as well as the expected result, the actual result of which would be computed and matched with that of the user.

There would be no code as this is a ‘test-first’ strategy. The number of classes and

methods that the program to be tested contains is irrelevant to the task of generating tests and test data, as they are solely based on specifications.

The specifications would be entered on several forms by the user, which takes

inputs like 1) Number of variables, 2) Types of variables, 3) Possible formula for the operation, 4) Specific constraints on the variables and parameters, 5) Dependencies etc. These data would be processed to produce the test frames.

The list of the concepts covered in COM1010 suggests that apart from arrays and

abstract data types (e.g. lists etc.), virtually all programming elements are covered if we are able to process integers, real, date and string values as parameters.

The proposed system would cover the common faults that novice programmers

make, e.g. Divide by Zero error, Input error (because of faulty input to the system). It would pinpoint such instances so as to generate awareness and help reduce their frequency of occurrence in the code they write in future.

So the problem finally stands like this: A practical test tool that would not only guide the students in writing tests for their code, but also practically demonstrate the writing of elementary tests for programs, in order for students to analyse them. The system should generate test frames, which the users may use to manually test their program under test. This test data helps novice programmers understand the possible input to the code they write for which adequate checks and measures should be implemented in order to make the code less faulty. A JUnit test for the function derived from its specification (or expression) may be generated by the system and representative values from the test data can be used to run the tests. This highlights the process of writing tests for methods implemented in a program and the use of pertinent assertions to make it effective.

Testing & Evaluation Strategies The proposed system can be evaluated through a combination of grading the extent to which it achieves its objectives, and an effective testing approach.




35

The system will be tested using two strategies:

o Testing driven by examples: The program in Listing 3.1 would be a sample program for which specifications would be used in order to assess the systems effectiveness to handle trivial programs. Some tests using sample programs would be created and executed to evaluate the efficiency and adaptability of the system in accepting and dealing with various types of situations.

o Usability testing (Human / computer interaction): Users would run the system

independently, without supervision, to determine the ease with which it can be operated intuitively. This strategy of testing would determine the suitability of the system for use by any user. It not only demonstrates its ease-of-use but also evaluates the extent to which it can be used effectively.

Chapter 4: Design



36

Chapter 4: Design This chapter describes the goals, design as well as the architecture of the proposed system.

System Goals The overall goals of the system are summarized in figure 4.

Fig. 4.1: Overview of System Goals.

From the figure 4.1, we find that there are several subsystems whose services would constitute the features of the proposed system. To name a few, the central control subsystem, the computation subsystem, the partitioning subsystem, the JUnit test generation subsystem, and the test data generation subsystem. Information about parameters, assertions, expressions and data dependency form part of the program specifications. Program specifications are accepted through the user interface and processed to compute the valid (range of) values for each parameter. Data dependency has been included as part of the specifications as they represent any inter-relation among the parameters that determines their value, range, format, length and content, as applicable. Further justification is provided later.

Chapter 4: Design



37

System Design The overall functionality is depicted in the use-case diagram presented in figure 4.2. From the figure, we find that the system has the following independent functionalities implemented in order to carry out its purpose:

Specification Input: Accepts the program specifications. Computation & Estimation: Computes Values and ranges. Partitioning: Creates partitions of parameters using Equivalence Partitioning,

Boundary Value Analysis and Category Partition methods. JUnit Test Generation: Generates JUnits for Expressions and Assertions. Test Data Generation: Generates test data using partition values and substituting

values from a data oracle for symbolic partitions

Fig. 4.2: Use Case overview of System Functionality.

Fig. 4.3: The ‘Partitioning’ use-case and the ‘JUnit’ use-case specialize ‘Computation &

Estimation’ use-case.

Chapter 4: Design



38

The ‘Partitioning’ use-case and the ‘JUnit’ use-case specialize ‘Computation & Estimation’ use-case and hence, ‘Computation & Estimation’ functionality has to be implemented before the other two; however, ‘Partitioning’ use-case and the ‘JUnit’ use-case can be developed concurrently, at least theoretically. This is depicted in figure 4.3. The ‘Test Generation’ use-case includes the ‘Partitioning’ use-case and hence it should be implemented after the ‘Partitioning’ use-case, as depicted in figure 4.4.

Fig. 4.4: The ‘Test Generation’ use-case includes the ‘Partitioning’ use-case.

System Architecture The design choice of the project is based on the ease with which its requirements can be fulfilled. In the following paragraphs the requirements of the project are highlighted and categorised such that they define clear-cut interfaces for fulfilment of their respective purposes. The division is primarily based on the Model-View-Controller design pattern. The model is an object (or in this context, a collection of objects) that represents information about the domain. It is non-visual and contains all the data and behavior other than that used for the user interface. The view represents the display of the model in the user interface. Lastly, the controller takes user input, manipulates the model, and causes the view to update appropriately. This pattern highlights and maintains the fact that the presentation depends on the model but the model does not depend on the presentation. Additionally, the presentation acts as the observer of the model: whenever the model changes (i.e. data is modified / updated) it sends out an event the causes the presentations to refresh the information they display. This design facilitates the communication of the state change message from the model to the user interface (or view).

View The proposed system which is being developed as a standalone application would need to have an interface (or GUI) with menu options to start testing tutorial or testing guide, as well as to start the testing tool. Other menu items would include options to display test frames, manipulate internal windows (internal frames) etc. All user interfaces that accept data from the user about the parameters, their dependencies, expressions as well as assertions form a part of this segment of the program.

Controller The input from the view should be processed in order to determine the desired process to invoke. Based on the input from the view, which is the user’s actions on the interface, the appropriate processes in the model are invoked. This means that the communication from

Chapter 4: Design



39

the view to the model takes place via the controller, which interprets the messages from the view and communicates appropriately to the model the action it needs to perform.

Model The ‘factory’ of any system is represented by the model. This is where the main logic and the worker classes reside performing all the operations in the system. The model in a way represents the data maintained by the system. All updates and data manipulation is carried out by the model based on the user actions (or commands) in the ‘view’. The model classes are responsible for processing the specifications of the test program in order to create the test data. It also generates the JUnit test cases for expressions and assertions too. As highlighted above the project is divided into 3 distinct subsections- model, view and controller. The controller processes the input from the view and informs the model; however, the model can independently inform the view of the change in its state.

Fig. 4.5: Using Observer to decouple the model from the view in the Active model.

The Active model of the MVC pattern is being used as the model changes state without the controller's involvement. This happens because the processing in the model changes the data which must be reflected in the view. Because only the model detects changes to its internal state when they occur, the model must notify the views to refresh the data displayed. In order not to introduce dependency of the model over the view and also to enable the model to update the view the Observer pattern is used which provides a mechanism to alert other objects of state changes without introducing dependencies on them. The view implements the Observer interface and registers with the model. The model tracks the list of observers that subscribe to changes, and so when the model changes, it notifies the observer of the change, without requiring specific information about the view. Figure 4.5 [WWW1] shows the structure of the active MVC using Observer and how the observer isolates the model from referencing view directly. Figure 4.6 [WWW1] illustrates how the Observer notifies the views when the model changes.

Chapter 4: Design



40

Fig. 4.6: Behavior of the active model

System Layout The MVC pattern just discussed facilitates isolating the model, view and controller classes. Figure 4.7 shows the classes that constitute the model of the system:

Fig. 4.7: The model classes

The class ‘ObservableAssertions’ (extreme left of second last row in figure 4.7) extends the ‘Observable’ class of the Java library which allows it to send messages to those classes in the view section of the system (e.g. ‘AssertionFrame’ class) which implement the ‘Observer’ interface of the Java library. This form of the MVC model is called the Active model.

Chapter 4: Design



41

The system processes only integer, real, string and date type parameters. The figure 4.8 shows the hierarchy of classes that resolve the dependency between the parameters and facilitate the computing of the parameters’ estimated value, range, length and content, as applicable. The figure 4.9 depicts the hierarchy of classes that produce categories and their partitions for all the parameters. The figure 4.10 show cases the hierarchy of classes that generate the test data automatically. Figure 4.11 shows the dynamic JUnit test classes hierarchy. The ‘JUnitTest’ and ‘AssertionTest’ classes are created dynamically to test the expressions and assertions respectively. Figure 4.12 a shows the Lexical Analyzer called ‘Scanner’ class that analyses the expressions and assertions entered by the user to ensure that the input is valid, conforming to the Java Language Specifications. Figure 4.12 shows the parser classes’ layout. The main parser class is named ‘parser’ (coincidently) and utilizes the other classes to validate the syntax and semantics of the input (reassertions and expressions) from the user in order to classify them as a valid form of one of the two: assertions or expressions.

Fig. 4.8: The dependency resolution group

Fig. 4.9: The partitioning group

Chapter 4: Design



42

Fig. 4.10: The test frames generator group

Fig. 4.11: The dynamic JUnit test classes

Fig. 4.12: a) The Lexical Analyzer structure, b) The Parser classes’ layout

Chapter 4: Design



43

Fig. 4.13: Class diagram showing the structure of the test frame generator classes

Chapter 4: Design



44

Fig. 4.14: Sequence diagram of the system

Chapter 4: Design



45

Figure 4.13 is a sample of the class structure of the hierarchy of classes that generate test frames. Due to space constraint none of the other functionality groups would be shown. Figure 4.14 shows the sequence diagram of the system. It is an overview starting with the specification input and ending with the generation of the test cases. Please note that class lifelines with asterisk (*) denote that just one parameter type (which is integer in the figure) has been considered due to space constraint. The rest of the parameters, i.e. real, date and string parameter types work in a similar manner and so removed for space conservation. Moreover, most of the other functionality classes do not appear in that diagram due to space constraints. This constitutes the overall system design.

Chapter 5: Implementation & Testing



46

Chapter 5: Implementation & Testing This chapter presents an overview of the system’s implementation. Implementation would comprise the methodology used for achieving the objectives of the system and the refinements that were made to the requirements to further adapt the system to as many scenarios of programs to be tested. Testing of the system is treated here to detect errors, faults and limitations of the system. It would be an indication of the correctness and completeness of the system as evaluated against the requirements.

Implementation In the next section the methodology used to generate test data as well as JUnit tests is discussed. This is followed by another section which highlights the refinements that are made to the requirements of the system.

Methodology The basic requirement for a testing tool (similar to the proposed system) is to be able to create test data with which the system under test is executed. The result of this execution would either return an expected or an unexpected result according to which the system would be considered to pass or fail the test respectively. These test data would have to be created based on the specifications of the program under test. This is in accordance with the ‘test-first’ strategy of XP. Hence, the process starts with the ‘specification input’ phase, accepting the count and types (integer, real, string, date) of different parameters (inputs and outputs) the test program is composed of. Next, for each parameter of the program to be tested, the following information is collected from the user, as applicable: For Numeric parameters (integer and real) information to be collected:

• Any specific range • Any specific value

For Alpha Numeric parameters (string) information to be collected:

• Any specific length • Any specific content • Is Upper case or Lower case • Minimum or maximum number of characters permitted

For Date types Information to be collected:

• Format • Any specific range • Any specific value

After the details of the respective parameters are collected, information about their ‘inter-dependency’ is also collected. By this it is meant that if value (or any other property of the parameters, as applicable) of a particular parameter is dependent on (or determined by) the value (or any other property) of another parameter, then the system needs to keep track of




47

it, for use in test data generation as it helps deduce the valid and invalid limits of the parameters. The tool also accepts expression(s) (mathematical formula involving the parameters) that is to be used to generate results in the program to be tested. This is with the view to generate trivial JUnit test class so as to demonstrate the process of writing tests using pertinent assert statements. This dynamically created JUnit test class evaluates the expression(s) using values for the parameters fed by the user and matches it with the expected value of the user. Moreover, it is used to automatically pinpoint a specific flaw in the code of the program if necessary checks for such a flaw do not exist in the code. The common errors to be pinpointed are Divide by Zero error (highest priority), Invalid Input error etc. If an expression contains parameter(s) in the denominator then if it acquires the value ‘0’ (or combined value of ‘0’) then the program code would through an exception in the absence of adequate checks to prevent it. This is the purpose of this system: to inculcate in the novice programmers an awareness of the possible faults in the program code so that they may be able to write better programs and test their programs for common causes of failures. The need for a dependency analysis arises if we consider the specifications of a typical program taken from [Roper 1997]: The program accepts a positive integer ‘x’ in the range 1- 20 which stands for the length of a string of characters ‘a’ that the program accepts from the user too. The program returns either the position in the string ‘a’ at which a character ‘c’ (another input to be searched in the string) is first found, or a message to say that it was not present in the string. We see that a property (length) of a parameter (string ‘a’) is dependent on the property (value) of another parameter (integer ‘x’). This shows the need for allowing the user of this project’s system to express any of the dependencies that may occur between the parameters. A list of possible dependencies between parameters includes the following:

On length of a string parameter On value of an integer or real parameter On range of an integer or real parameter On value of a date parameter On content of a string parameter

Another element that forms part of the specifications is an assertion. Let us consider the triangle problem discussed in [Myers 2004]: The program accepts three numbers which represent the sides of a triangle and determines what kind of a triangle is formed by the sides- equilateral, isosceles or scalene. The author quips that any computation would be prone to errors if it does not exercise the constraint that for the three numbers to represent the sides of a triangles the sum of two sides should be greater than the third. For example, the numbers 1, 1, and 2 do not represent an isosceles triangle as they do not form a triangle in the first place. Thus, we find that for an effective testing some constraints (at times) do need to be exercised in order to further qualify the test data to address additional validity criteria. This project’s system allows it’s users to express constraints (as well as expressions as discussed earlier) conforming to the Java Language Specifications. To facilitate this, a lexical analyzer and a parser (discussed later) enforcing conformity with Java Language Specifications are put to use.




48

At this stage the system knows the number of types of parameters used and determines their values, ranges, format, and dependencies, if not already specified as part of the specifications. As this system is a Rule-based test data generator, it relies on specification testing based methods like equivalence partitioning, boundary value analysis to form categories of parameters that can be partitioned into valid and invalid (range of) values, for a partial - input (a simile to exhaustive - input) testing. The rules used to categorize different types of parameters are: For Numeric parameters (integer and real) the category is:

• Value of the parameter For Alpha Numeric parameters (string) the categories are:

• Content of the parameter • Length of the parameter • Case of the parameter

For Date types the category is:

• Format of the parameter (i.e. dd/mm/yyyy or mm/dd/yyyy) • Temporal value of the parameter (i.e. the date value)

The parameters are categorized using their respective types of categorization rule and equivalent partitions are created using the principles of equivalence partitioning and boundary value analysis. Listing 5.1 shows an example of the process of categorising and partitioning the parameters in order to determine their (range of) valid and invalid values. It presents a methodical treatment of the specifications of a program (sample program in Listing 3.1) that calculates the area and circumference of a circle given a particular radius (The Circle Problem): a) Functions:

1. calculate the area 2. calculate the circumference

b) Parameters:

1. for Calculate_Area function • Inputs – radius of the circle (real) • Outputs – area of the circle (real)

2. for Calculate_Circumference function • Inputs – radius of the circle (real) • Outputs – circumference of the circle (real)

c) Categories and Partitions:

1. for Calculate_Area function




49

• Value of radius of the circle * Less than 0 [ERROR] * 0 [property Radius_Zero] * Between 0 and 2 31 {= (2 64 / PI) 0.5 } [property Value_OK] * Greater than 2 31 [property Buffer_Overflow]

• Value of area of the circle * Less than 0 [ERROR] * 0 [property Area_Zero] [ If (Radius_Zero)] * Between 0 and 2 64-1 [ If (Value_OK)] * Greater than 2 64-1 [ If (Buffer_Overflow)]

2. for Calculate_Circumference function

• Value of radius of the circle * Less than 0 [ERROR] * 0 [property Radius_Zero] * Between 0 and 2 62 {= (2 64 / 2*PI) } [property Value_OK] * Greater than 2 62 [property Buffer_Overflow]

• Value of circumference of the circle * Less than 0 [ERROR] * 0 [property Area_Zero] [ If (Radius_Zero)] * Between 0 and 2 64-1 [ If (Value_OK)] * Greater than 2 64-1 [ If (Buffer_Overflow)] Listing 5.1: Test Specifications for Circle program in listing 3.1 Listing 5.1 highlights the general form of the categorising and partitioning process. The ‘radius’ and ‘circumference’ are real parameters. Without any special constraints (as present in the triangle problem) the range of valid values ‘radius’ can accept is 0 to 2 31 in the area calculation problem. This is to avoid buffer overflow for real parameters when calculating the ‘area’ as it would be squared and multiplied by PI (~ 3.142). The rest of the values are invalid. The valid range of values for the ‘area’ is 0 to 2 64 – 1 which is the valid range of values for any real number (using Double class). Similarly, the valid range of values of ‘radius’ in the circumference calculation problem is 0 to 2 62 as it should allow 2*PI to be multiplied to it to determine the circumference; while it the same for ‘circumference’ as range of ‘area’ in the area calculation problem. These asterisks ‘*’ in listing 5.1 represent the equivalent partitions for the ‘value’ categories of the two real parameters ‘radius’ and ‘area’. As a rule if an integer parameter, say ‘n’, has a range from 2 to 25, then the partitions would be 0, 1, 2, 3, 24, 25, and 26, utilizing the boundary value analysis rules. 0, 1, and 26 are the invalid partitions and 2, 3, 24, and 25 are the valid partitions. According to the boundary value analysis (approach discussed in chapter 2), the possible partitions of a parameter having a range of values are the two edges of the range (2, 25), two values beyond either range values (1, 26), one value just above the lower range (3) and one value just below the upper range (24). ‘0’ being a number which often creates trouble in expressions is also considered. Additionally, value around ‘0’ i.e. (-1, 1) can be considered. A particular value in the equivalent partitions can be chosen to represent that partition for each category of each parameter and together they form the test frame. These partitions in




50

the above example represent a very large range; however, usually the range is narrowed due to additional dependencies of parameters and constraints on them. In order to complete the process of testing using the category partition method every possible combinations of each partition of a category (only one at a time) with that of other categories is created to represent a set of test frames. They are being called test frames as they are an intermediate form of test data to be generated. This is because some of the partitions may be a symbolic element for which concrete value has to be substituted in order to execute the program to be tested with the test data. This can be highlighted with an example. Let us suppose that one of the parameters of a program specification called ‘x’ is of string type. Then valid content of ‘x’ could be a string without any special characters, i.e. comprising letters and numbers. In order to identify such conditions this system uses regular expressions like:

[A-Za-z0-9]* [^A-Za-z0-9]+ [A-Za-z0-9]* to represent an invalid input. The above expression essentially means if there is any character other than the A to Z, a to z, and 0 to 9 then it is an invalid content. So when the system creates partitions for the category ‘Content of the parameter’ then valid partitions would be any content not matching the above regular expression and invalid would be one matching it. However, this cannot be used as a part of the test data. So an oracle may be used from which to select representative values for both matching and not matching the expressions. This substituted content of the string parameter ‘x’ would form a part of the test data. Length of a string parameter is partitioned just as any integer or real parameters’ value category is done. Case of a string can be partitioned into three possible values: All Lower Case, All Upper Case, and No Case Restrictions. Format of a date type parameter can be partitioned in two forms: mm/dd/yyyy and dd/mm/yyyy. Necessary computations have been done to ensure the difference when dealing with the date type parameters. Value or ranges of values calculations and dependency calculations have been accomplished using the ‘GregorianCalender’ Java library. The partitions of a date type parameter having a range of date as value would be a date just before the earliest date, the earliest date (of the range), a date just after the earliest date, an intermediary date, a date just before the latest date (of the range), the latest date, and a date just after the latest date. Likewise, a date type parameter with a single value can be partitioned in 3 partitions: a date just before the given date, the given date, and a date just after the given date. By using representative test data as values of inputs the corresponding output of the program to be tested can be computed according to its specifications; then the program can be executed with the test data and its actual output is compared against the expected computed output. If they match then the program has passed the test data. Meanwhile, the system dynamically generates the JUnit test structure for the expressions and assertions. The expressions are written to a file and enclosed within the normal structure of a JUnit test class. This file is loaded, compiled using the Java compiler’s library to form a Java class or more specifically a JUnit class. To illustrate, let us suppose




51

that an expression of the form “A = (Q+W)/S” is specified by the user. This expression is encoded as shown in listing 5.2 and written to a class which can be run as a JUnit class: public void testJunitTest() throws java.io.IOException { System.out.print("Enter the value of S:"); S = Double.parseDouble(br.readLine()); System.out.print("Enter the value of W:"); W = Integer.parseInt(br.readLine()); System.out.print("Enter the value of Q:"); Q = Integer.parseInt(br.readLine()); System.out.print("Enter the expected value of A:"); expected = Double.parseDouble(br.readLine()); try { assertEquals(expected, solve(S, W, Q ), 0.001); System.out.println("The calculated value of A:"+A); } catch (Exception ex) { System.err.println("Expression Raised An Exception"); } }

public double solve(double S, int W, int Q) throws NumberFormatException {

A= (Q+W)/S; return A; } Listing 5.2: Dynamically created JUnit Test method When the JUnit test class for the expression is run it asks for the values of the parameters used in the expression and an expected result value, computes the result of the expression with values entered and matches the actual result with that the user expects. This happens completely dynamically for any expression and there is no hard coded JUnit test class with fixed expression. Whatever might be the expression the parameters used in the expression will form the parameters of the ‘solve ( )’ method (as shown in listing 5.2) and the parameter calculated (L.H.S of the expression) will be included in the main body of the ‘solve ( )’ method. Similarly, the assertions are treated. In JUnit for assertions ‘assertTrue()’ is used rather than ‘assertEquals()’ and ‘solve( )’ is not required. Further, this JUnit tests has another method (called doIt( ) ) which contains the same functionality, however is executed internally every time a test data is generated to determine if the test data passes the constraints (or assertions). This means that the output test data displays the




52

result of the test passing or failing with respect to the valid or invalid partitions as well as constraints.

Refinements Another aspect that is treated by the system is that it pinpoints automatically the

potential flaws in the program in the absence of adequate tests by pointing out instances like the Divide by Zero error, Invalid Input error (as mentioned above). This is achieved by investigating the expressions to check if there is a parameter (or a combination of parameters) in the denominator and tries to make it zero, using suitable values of parameters. It then outputs the values of the parameter it used to achieve that error. Likewise, invalid string content is used to achieve Invalid Input error, which is made known to the user so that he is aware that such a situation may arise and if no adequate measures are taken , it may result in exceptions arising.

The system incorporates a parser and a lexical analyzer. The lexical analyzer (also

called a scanner) scans inputs conforming to Java Language Specifications (J. L. Specs). This scanner was created manually from hand-crafted specifications using JFLEX (v 1.4.1) lexical analyzer generator to identify the input by the user. It analyses the expressions and the assertions and identifies in them identifiers (parameters), logical operators, numeric operators, punctuations etc and feeds it to the parser. The parser identifies and validates the soundness of the assertions and expressions entered by the user against J. L. Specs. This parser was created manually from hand crafted specifications using a YACC called CUP (v 0.10).

The system has a scrollable desktop (main window of this system). The scrollable

property increases the viewable area of the internal windows manifold. This means that no matter how big the internal windows are the main window can be scrolled to view the entire internal windows.

A generic data store (called NumericDataStore<T> shown in Appendix A) has

been created to hold the values, range, and dependency information of both integer and real parameters in the same structure. Appendix A lists the program code.

Testing As discussed in the Requirements & Analysis chapter, the testing approach is comprised of two strategies: 1) Testing driven by examples, 2) Usability testing. In the first strategy, 4 sample programs would be analysed and the system performance would be judged based on the actual outcome and the expected outcome.

Testing Driven by examples The following are sample programs, whose specifications are analysed and fed into the system. The outcome of the system would be scrutinized.




53

Test 1 The Triangle Problem [Myers 2004]: The program reads three integer values from an input dialog. The three values represent the lengths of the sides of a triangle. The program displays a message that states whether the triangle is scalene, isosceles, or equilateral.

Specification Analysis: Parameter : Type Side ‘A’ : integer (No specific values or ranges) Side ‘B’ : integer ( -do- ) Side ‘C’ : integer ( -do- )

Dependency : NONE

Constraints : A + B > C & B + C > A & C + A > B A > 0 & B > 0 & C > 0 (logical) System’s Processing: Partitions of ‘A’: -2147483648; 2147483647, -1, 0, -1, -2147483647, 2147483646 Partitions of ‘B’: -2147483648; 2147483647, -1, 0, -1, -2147483647, 2147483646 Partitions of ‘C’: -2147483648; 2147483647, -1, 0, -1, -2147483647, 2147483646

Sample Test Data Generated Showing only 3 of the test data generated:

Fig. 5.1: Test cases (data) for Triangle problem.




54

NB: When there is no specific value or range that the system can deduce, it assigns a generic range to the parameter equivalent to the parameters’ type’s range. Here 2147483647 is the maximum integer number and -2147483648 is the minimum. All of the partitions are valid if we do not consider the assertions. The system treats test data validity with respect to parameters’ values and assertions separately. So it calculates ‘Data Test Fails’ solely with respect to the valid of invalid partitions that constitute the test data. If any of the partitions is invalid the ‘Data Test Fails’ is said to be true and so it outputs ‘YES’, otherwise, if all the partitions that constitute the test data are valid then it outputs ‘NO’. Likewise, if a test data fails with respect to the assertions then ‘Assertion Test Fails’ is said to be true and the system outputs ‘YES’, otherwise ‘NO’. All the 3 test data shown in figure 5.1 happen have valid values of A, B, and C. So the ‘Data Test Fails’ is ‘NO’ in all test cases. It should be pointed out that the second test data does not pass the assertion test, although it should have passed as (A, B, C > 0). It is because of the way Java computes a condition: if(A+B>C) when (A,B,C = 2147483647) will evaluate to false as the addition of A and B is still 2147483647 (maximum value of integers) and so equates to the same as C. Rest of the test data generated are as expected and are simply combinations of one of the partitions of each of A, B and C. Please note: Category is ‘Value of Variable’. SYSTEM TEST PASSES: Partially

Test 2 The Find Character Problem [Roper 1997] The program prompts the user for a positive integer ‘X’ in the range 1-20 and then for a string of characters ‘A’ of that length. It then accepts a character ‘C’ and returns the position in the string at which the character is first found, otherwise indicates that it is not present. Specification Analysis: Parameter : Type A : string X : integer C : character (implemented as a string in this system) Dependency : length of A = Value of X Constraints : 0 < X < 21 System’s Processing: Partitions of ‘A’: Category: Length of Variable: -1, 0, 1, 2, 19, 20, and 21




55

Category: Case of Variable: Upper (denoted by ‘U’) or Lower (denoted by ‘L’) Category: Content of Variable: matching [A-Za-z0-9]* [Â-Za-z0-9]+ [A-Za-z0-9]* Not matching [A-Za-z0-9]* [Â-Za-z0-9]+ [A-Za-z0-9]* Partitions of ‘X’: -1, 0, 1, 2, 19, 20, and 21 Partitions of ‘C’: Category: Length of Variable: -1, 0, 1, and 2 Category: Case of Variable: Upper (denoted by ‘U’) or Lower (denoted by ‘L’) Category: Content of Variable: matching [A-Za-z0-9]* [Â-Za-z0-9]+ [A-Za-z0-9]* Not matching [A-Za-z0-9]* [Â-Za-z0-9]+ [A-Za-z0-9]* As mentioned earlier, a string parameter is categorised by three criteria- ‘Length of Variable’, ‘Case of Variable’ and ‘Content of Variable’; while an integer is done by one criterion- ‘Value of Variable’. Please note that an oracle that would substitute the regular expression with representative string values has not been implemented yet. Sample Test Data Generated Showing only 2 of the test data generated:

Fig. 5.2: Test cases (data) for Find Character problem.

The ‘Data Test Fails’ is true in the first test case because of invalid partition values of A and C for the category ‘Content of Variable’ where as in the second test case the test passes because of valid partitions for the same. ‘Assertion Test Fails’ is true in the first test case because of invalid partitions of A and C, where as it passes in the second test case because of valid partitions for the same.




56

Rest of the test data generated are as expected and are simply combinations of one of the partitions of each of A, X and C. SYSTEM TEST PASSES: Completely

Test 3 The Circle Problem (Listing 3.1): The program sets the parameter ‘radius’ and uses it to calculate the circumference and area of a circle. Specification Analysis: Parameter : Type radius : real (No specific values or ranges) area : real ( -do- ) circum : real ( -do- ) Dependency : NONE

Constraints : radius > 0 (logical)

Expressions : area = PI * radius * radius circum = 2 * PI * radius System’s Processing: Partitions of ‘radius’: -2147483648; 2147483647, -1, 0, -1, -2147483647, 2147483646 Partitions of ‘area’: -2147483648; 2147483647, -1, 0, -1, -2147483647, 2147483646 Partitions of ‘circum’: -2147483648; 2147483647, -1, 0, -1, -2147483647, 2147483646 Sample Test Data Generated Showing only 2 of the test data generated:

Fig. 5.3: Test cases (data) for Circle problem




57

NB: When there is no specific value or range that the system can deduce, it assigns a generic range to the parameter equivalent to the parameters’ type’s range. Here 2147483647 is the maximum integer number and -2147483648 is the minimum. All of the partitions are valid if we do not consider the assertions. The system treats test data validity with respect to parameters’ values and assertions separately. So it calculates ‘Data Test Fails’ solely with respect to the valid of invalid partitions that constitute the test data. If any of the partitions is invalid the ‘Data Test Fails’ is said to be true and so it outputs ‘YES’, otherwise, if all the partitions that constitute the test data are valid then it outputs ‘NO’. Likewise, if a test data fails with respect to the assertions then ‘Assertion Test Fails’ is said to be true and the system outputs ‘YES’, otherwise ‘NO’. The ‘Data Test Fails’ is false (i.e. data test passes) in the first test case because of valid partition values of ‘radius’ and ‘area’ and ‘circum’ where as in the second test case the test fails. ‘Assertion Test Fails’ is true in the second test case because of invalid partition of ‘radius’, where as it passes (i.e. ‘Assertion Test Fails’ is false) in the first test case because of valid partitions for the same. Rest of the test data generated are as expected and are simply combinations of one of the partitions of each of radius, circum and area. SYSTEM TEST PASSES: Completely

Test 4 A hypothetical problem: The program has been created without any significant purpose of its application, however, treated here to completely exercise the features of the project’s system. In order words, what the program’s goal is not specified, only its parameters, expressions and assertions are used to completely test the expressiveness and adaptability of this system Specification Analysis: Parameter : Type Q : integer (No specific values or ranges) W : integer ( -do- ) A : real ( -do- ) S : real ( -do- ) Z : string (value = ‘omair’, length =5 (minimum length), fully Lower case) X : string (value = ‘EEE’, length =3 (maximum length), fully Upper case) C : string (No specific content or length) V : string ( -do- ) Dependency : vQR= Lz, Lx vWR= Lc, Va

V= Lz AR V= LcSR




58

V= “omair” ZL=5 V= “EEE”XL=3 VCL=Lz VVL=Lx

The subscript ‘V’ denotes Value of numeric parameters or Content of non-numeric parameters. The post script ‘R’ denotes the Range of the numeric parameters while ‘L’ denotes the Length of the non-numeric parameter. For example, vQR= Lz, Lx denotes that the numeric variable ‘R’ has a range values denoted by the value of the length of variable ‘Z’ and ‘X’. The expression V= Lz AR denotes that the Value of ‘A’ is equal to the value of the length of the parameter ‘Z’. the expression V= “omair” ZL=5 denotes that the Length of ‘Z’ is 5 and the content is ‘omair’. This example is taken up primarily to test the system’s ability to deduce parameters’ values, range, length etc even when there is a continuous cycle of dependence, i.e. dependency cycle (circular dependency).

Constraints : Q + A = W + S

Expressions : A = (Q+W)/S

System’s Processing: According to the complex dependency structure, the following are the values that should be deduced by the system: Q: Range = 3, 5 W: Range = 5, 5 A: Value = 5 S: Value = 5 C: Length = 5 V: Length = 3 Z: Length = 5 X: Length = 3 Sample Test Data Generated Showing only 1 of the test data generated:

Fig. 5.4: Test cases (data) for a hypothetical problem




59

The ‘Data Test Fails’ is ‘YES’ (i.e. data test fails) because of invalid string parameters’ partition. ‘Assertion Test Fails’ is ‘YES’ because Q + A != W + S. Rest of the test data generated are as expected and are simply combinations of one of the partitions of each parameter’s categories. SYSTEM TEST PASSES: Completely

Usability Testing The system was tested by 3 users, excluding the author of this thesis. Some of the observations are listed below:

Two of the three users found the idea of specifying the specifications very interesting and easy to work, remarking the convenience in this approach than the one using a specification language, like Z.

One user found the interaction between her and the system intuitive, however, she

had difficulties expressing a program in the way the system treats a program to be tested.

A user quipped that the JUnit test classes that were constructed should run

automatically with the test data that has been generated, without requiring the intervention of the user to supply parameters’ value. The automatically generated data that is used should be a ‘pass’ test data and the other should be a ‘fail’ test data. However, this idea prevents a user from trying out some test data he has created on his own.

A user who has recently worked very closely with the unit testing and mocking

frameworks reiterated the previous observation by adding that the JUnit classes should be analogous to the real implementation- JUnit setup. This means that the method that the JUnit evaluates (or executes) in the JUnit test class should be in a separate implementation class and the JUnit exercise that method from outside that class. This would be more close to the real life JUnit.

A user highlighted the need for a more comprehensive user manual and a help

system. The current tutorial does not help a user navigate through in case he is stuck working on the system.

Chapter 6: Results & Discussion



60

Chapter 6: Results & Discussion This chapter focuses on analysing the results of the project and evaluating them. The first part corresponds to the result of testing. Then, the achievements are critically discussed. Finally, some ideas of improvement are developed.

Results The system was tested using four sample programs, each representing a completely different circumstance and set of requirements. The objective of using such sample programs in testing is to evaluate the adaptability of the system to the various circumstances that a program to be tested might possess. The Triangle Problem [Myers 2004] provided the motivation for improvising the mechanism of constraints or assertions. These assertions help in narrowing the range of values the parameters of the program to be tested can receive. This facilitates smaller magnitude of partitions and usually smaller values of intervals, thereby making the figures more comprehendible. The gain is instantaneous, users can easily understand the difference in the range of partition values. The Triangle Problem was partially successful, as even though the program to be tested has constraints, the ranges of valid partitions were too large, consequently large partition figures itself. The Find Character problem provided the motivation for parameter dependence mapping. It is a classic example that depicts how parameters of different types can be interdependent. A successful completion of this program reveals the degree of expressiveness modelled in the system. The Circle Problem is an example that was taken from the lecture notes of the first Java programming course (COM1010). It represents the common type of programs that are usually dealt by the novice programmers. The successful test demonstrates that the system can reasonably handle program code of the complexity used by the system’s common users. The final program sample fully demonstrates the level of complexity of dependency calculation the system can resolve. A hypothetical problem was considered to check the system’s expressiveness and the complexity it can handle. A successful test shows that the system can adequately handle the dependence complexity between parameters and can act suitable in situations like circular dependency (explained later).

Test Number Test Name Result

1. The Triangle Problem Partially Successful 2. The Find Character Problem Completely Successful 3. The Circle Problem Completely Successful 4. A hypothetical problem Completely Successful

Table 6.1: The results of the testing From table 6.1, we find that the system passes 75% tests completely. This result coupled with the fact that it achieves all the objectives of the system too makes the system a highly successful one. Moreover, the usability testing shows 100% system intuitiveness score and 66% Ease-of-Use score.




61

Achievements & Discussions In this section, we critically evaluate the main achievements of the system:

The system is a purposeful learning cum practical test tool as it puts across the notion of testing, i.e. to find errors. It highlights the common flaws of a novice programmer like Divide By Zero error, Invalid Input error etc and creates test data that are based upon boundary value analysis, equivalence partitioning and category-partition methods. These data give an insight to the users about the various kinds of data a program code may have to handle and without adequate checks for common flaws it would remain error prone. It also highlights the craft of writing JUnit tests for a program code.

The system promotes Test Driven Development as it forces the user to think about

the tests before beginning to code. The tests have to be developed based on the specifications and so the user already has a good understanding of the inputs to and outputs from the system, thereby writing tests independent of the way the functionality is implemented.

The system does not incur the burden on users to have an understanding of any

specifications language like Z. It allows the user to specify the specifications of a program to be tested predominantly through selecting of options rather than any great typing to do.

When recursive dependencies exist in a program’s specifications, often a cycle of

dependencies is found to occur. This can be explained by assuming that there are 3 parameters in a program to be tested, named A, B and C. If a property of A (value, length, content etc) depends on some depends on a property of B, this property is in turn dependent on a property of C, which further is dependent on the same property of A with which the dependency started, then this kind of a situation constitutes a dependency cycle or circular dependency. The system tackles such a situation by stopping the search for the values of the properties of A, B and C the moment it finds a parameter repeated in a recursive search function. Then it allocates a generic range of values to that property depending on the type of parameter. For example, if value of integer variable A is equal to the length of a string variable B, the length of B is equal to the value of C, and the value of C is equal to the value of A; then the value of A cannot be a specific value, rather the entire range of integers could be the valid values of A.

Further Work Due to constraints of time not all the desired features of the system have the best possible implementation currently. The following are a list of suggestions that can be implemented which most like improve the system:

The current system supports only integer, real, date and string type of parameters. This capacity can be extended too include lists and abstract data types.




62

Mechanism to constrain the number of potentially numerous test data generated is absent. For example, even a program with 4 string parameters, 2 integer parameters and 2 real parameters (the situation is depicted in Test 4 of Chapter 5) would have about 6000 test cases. This situation is likely to improve if further constraining mechanism is introduced.

If all the potentially numerous test cases are computed and stored in a suitable data

structure, it would consume a lot of system resources. Hence, the system incorporates a pooling sub-system which pools test cases (depending on the number of test cases the user wishes to see) and displays them. This pooling algorithm selects one partition at a time for each category and that too for each parameter in order to create a test data set. This pooling system must ensure combinatorial selection of each partition of a category of one parameter with every other partition of the categories of other parameters. A pooling algorithm to efficiently select varied test data generated is only partially effective

An oracle for generating string values of particular types for use in test data

generated is yet to be implemented. These generated string values would replace the regular expressions matching the content of the string.

Chapter 7. Conclusion



63

Chapter 7: Conclusion The chapter concludes this thesis with a summary of the project and its system The aim of this project was to develop a ‘Learning cum Testing’ tool that inculcates the notion of testing among the students of the first Java programming course of the Department of Computer Science, University of Sheffield. It is small effort to redress the observation in [Myers 2004]: “Testing has been an out-of-vogue subject”. Of course, the motivation lies in: “Our students graduate and move into industry without any substantial knowledge of how to go about testing a program” [Myers 2004]. The project broadly can be divided into two goals: 1) to provide a tutorial system that guides the students in developing and honing their craft of testing; 2) to provide a test generator and executor tool that helps students learn how to test a program code. The second part helps students gain an insight into the possible faults that may commonly occur by observing what types of checks the system makes and also the kind of test data generated to execute the program under test. This improves the awareness of the students to common faults and helps him write less faulty code as a result. The approach undertaken by the system is simplistic as it starts by considering simple classes and functions in order to write tests. This helps develop an approach to write tests in a structured manner. Tacitly, the project seeks to promote the Test Driven Development culture prevalent in the software industry. The project is based on the concept of XP where the stress is on a ‘test-first’ strategy. It generates tests based on the specifications of a program’s functional units. Some of the features of the system are: It provides automatic pinpointing of the potential flaws like Divide by Zero error etc. with respect to the expression used in the program to be tested. It generates automatically critical test data that can be executed by the developed code of program under test to check for errors. It provides simple JUnit tests for the expressions used and assertions on the parameters in the program under test. The system is built on a framework that provides a scrollable desktop. This allows the internal user interface windows much freedom of size and location and also increases the conveniences of window management. The system uses the services of a lexical analyzer and a parser to identify expressions and assertions submitted by the user. The lexical analyzer identifies the elements if they are valid Java identifiers, tokens, operators etc, whereas the parser validates the syntax of the expressions and assertions. The system was tested by 4 sample tests of varying requirements and conditions. These sample tests imposed varying requirements on the capability of the system in order to adapt better to the different kinds of test programs that may need to be tested. The system was found to pass 75% tests completely. This result suggests the expressiveness and generality in-built into the system. Moreover, the system achieves the over all objectives of the system in providing both, a learning tool (accompanied with a tutorial manual) as well as a practical test demonstration tool. The usability tests showed 100% system intuitiveness score and 66% Ease-of-Use score. This inference was drawn by getting the usability of the system tested by three users.




64

Many ideas appeared on how to improve the current system. First the current working set can be increased from integer, real, string and date parameter to include arrays and abstract data types. A more effective constraint mechanism that further reduces the quantity of test cases produced while maintaining the same quality. Pooling of test data using an improved algorithm, which takes care of the complexity involved in the task, while attaining a more varied permutation and combination of test partitions. This project opens the way to the introduction of software testing of the most basic level to foundation classes of a graduate degree in software engineering and also to a new way of learning.




65

References



References [Barnat 2002] J. Barnat, T. Brazdil, P. Krcal, V. Rehak, and D. Safranek. (2002) Model checking in IPv6 Hardware Router Design [Online]. Available: http://www.cesnet.cz/doc/techzpravy/2002/ipv6hwdesign/ [27/05/2007]. [Barnett 2004] M. Barnett, K.R.M. Leino, W. Schulte, “The Spec# programming system: An overview. In: Construction and analysis of safe, secure and interoperable smart devices (CASSIS)”. Lecture notes in computer science, vol. Springer, Berlin Heidelberg New York, 2004

[Beck 2002] K. Beck, Test-driven development: By Example. First edition, Boston: Addison-Wesley, 2002.

[Beck 2004] K. Beck, Extreme Programming Explained- Embrace Change. Second edition, New York: Addison-Wesley, 2004. [Becker 2002] P. Becker. (2002) Eliminating Functional Defects through Model-Based Testing [Online]. Available: http://www.stickyminds.com/s.asp?F=S5995_ART_2 [25/08/2007] [Beizer 1990] B. Beizer, Software Testing Techniques, Second Edition, New York: International Thomson Computer Press, 1990. [Clarke 1999] E. M. Clarke, O. Grumberg, and D. A. Peled. Model Checking. The MIT Press, Cambridge, MA, 1999. [Dale 2000] N. Dale et al, “The assimilation of software engineering into the undergraduate computer science curriculum”, In Proc. 31st SIGCSE Technical Symp. on Computer Science Education, Austin, TX, May 2000: pp. 423-424. [Deason 1991] W. H. Deason, D. B. Brown, K.H. Chang and J. H. Cross: “A rule-based software test data generator”, IEEE Transactions on Knowledge and Data Engineering, Vol. 3, No. 1, pp. 108-117, March 1991. [DeMillo 1978] R. A. DeMillo, R. J. Lipton and F. G. Sayward. “Hints on test data selection: Help for the practicing programmer”, IEEE Trans. on Computer, Vol. 11, Part 4, pp. 34-41, April 1978. [Duran 1984] J. W. Duran and S. C. Ntafos, “An Evaluation of Random Testing”, IEEE Transactions on Software Engineering, Vol. SE-10, No. 4, pp. 438-444, July 1984. [Dustin 2003] E. Dustin, Effective software testing: 50 specific ways to improve your testing, 1st edition, Boston, Pearson Education Inc, 2002. [Gallagher 1993] M. J. Gallagher and V. L. Narasimhan, “A software system for the generation of test data for ADA programs”, Microprocessing and Microprogramming, Vol. 38, pp. 637-644, 1993.

http://www.stickyminds.com/s.asp?F=S5995_ART_2

References



[Goldwasser 2002] M. H. Goldwasser, “A gimmick to integrate software testing throughout the curriculum”, In Proc. 33rd SIGCSE technical symposium on Computer science education, February 27-March 03, 2002, Cincinnati, Kentucky. [Gutjahr 1993] W. Gutjahr, “Automatische Testdatengenerierung zur Unterstuetzung des Softwaretests”, Informatik Forschung und Entwicklung, Vol. 8, Part 3, pp. 128-136, 1993. [Hamlet 1987] R. G. Hamlet, “Probable correctness theory”, Information Processing Letters, Vol. 25, pp. 17-25, 1987. [Henzinger 2003] T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. “Software verification with blast”. In Proc.10th International SPIN Workshop on Model Checking of Software, volume 2648 of LNCS, 2003. [Huang 1975] J. C. Huang, “An approach to program testing”, ACM Computer Surveys, Vol. 7, Part 3, pp. 113-128, 1975. [Jalote 2005] P. Jalote, An Integrate Approach to Software Engineering, third edition, New York: Springer Press, 2005. [JUnit 2003] JUnit, 2003. http://www.junit.org. [Khurshid 2002] C. Boyapati, S. Khurshid, and D. Marinov. “Korat: automated testing based on Java predicates”. In Proc. International Symposium on Software Testing and Analysis, pages 123–133, 2002. [Khurshid 2003] S. Khurshid, C. S. Pasareanu, and W. Visser. “Generalized symbolic execution for model checking and testing”. In Proc. 9th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 553–568, April 2003. [King 1976] J. C. King. Symbolic execution and program testing. Communications of the ACM, 19(7):385–394, 1976. [Korel 1990] B. Korel, “Automated software test data generation”, IEEE Transactions on Software Engineering, Vol. 16, No. 8, pp. 870-879, August 1990. [McMinn 2004] P. McMinn. “Search-based software test data generation: A survey”. Software Testing, Verification and Reliability, 14(2):105–156, June 2004. [Meyer 1998] B. Meyer. Object-oriented Software Construction. Series in Computer Science. Prentice-Hall International, 1988. [Morell 1990] L. J. Morell, “A theory of fault-based testing”, IEEE Transactions on Software Engineering, Vol. 16, No. 8, pp. 844-857, August 1990. [Myers 2004] G. J. Myers, The Art of Software Testing, Revised 2nd Edition, New Jersey, John Wiley & Sons Inc., 2004.

http://www.junit.org/

References



[Ould 1991] M. A. Ould, “Testing - a challenge to method and tool developers”, Software Engineering Journal, pp. 59-64, March 1991. [POOC 2002] H. Schlenker and G. Ringwelski. “POOC: A platform for object-oriented constraint programming”. In Proc. 2002 International Workshop on Constraint Solving and Constraint Logic Programming, pages 159–170, June 2002. [Roper 1997] M. Roper, “Computer aided software testing using genetic algorithms”, In Proc.10th Int. Software Quality Week, 1997. [Staknis 1990] M. E. Staknis. Software quality assurance through prototyping and automated testing, Information Software Technology, Vol. 32, pp. 26-33, 1990. [Sthamer 1995] H. H. Sthamer, “The Automatic Generation of Software Test Data Using Genetic Algorithms”, PhD thesis, University of Glamorgan, 1995. [Symstra 2005] T. Xie, D. Marinov, W. Schulte, and D. Notkin. “Symstra: A framework for generating object-oriented unit tests using symbolic execution”. In Proc. TACAS’05 (11th Conference on Tools and Algorithms for the Construction and Analysis of Systems), volume 3440 of LNCS, pages 365–381, 2005. [Visser 2004] W. Visser, C. S. Pasareanu, and S. Khurshid. “Test input generation with Java PathFinder”. In Proc. 2004 ACMSIGSOFT International Symposium on Software Testing and Analysis, pages 97–107, 2004. [Walkinshaw 2006] N. Walkinshaw, K. Bogdanov, M. Holcombe. “Identifying State Transitions and their Functions in Source Code”. In Proc. Testing: Academic & Industrial Conference on Practice and Research Techniques, pages 49-58, 2006. [Wegener 2001] J. Wegener, A. Baresel, and H. Sthamer. “Evolutionary test environment for automatic structural testing”, Information and Software Technology, 43(14), pp. 841-854, 2001. [Weichselbaum 1998] R. Weichselbaum, “Software test automation by means of genetic algorithms”, In Proc.6th Intl. Conference on Software Testing, Analysis and review, 1998. [WWW1] Model View Controller (Web Reference – MSDN Article) [Online]. Available: http://msdn2.microsoft.com/en-us/library/ms978748.aspx [27/05/2007]. [Yang 1995] X. Yang, B. F. Jones, and D. Eyres, “The automatic generation of software test data from Z specifications”, Research Project Report III, CS-95-2, February 1995.

http://msdn2.microsoft.com/en-us/library/ms978748.aspx

Appendix B



A generic data store

public class NumericDataStore<T> { private T numericValue; private T lowerRangeValue; private T upperRangeValue; private boolean valueSet; private boolean rangeSet; private boolean valueUndefined; private String[] valueDependsOn = new String[]{"Nothing","Nothing"}; private String[] lowerRangeDependsOn = newString[] {"Nothing", "Nothing"}; private String[] upperRangeDependsOn = new String[] {"Nothing", "Nothing"}; public NumericDataStore()

{ numericValue = null; lowerRangeValue = null; upperRangeValue = null; valueSet = false; rangeSet = false; valueUndefined = false; } public NumericDataStore(T value, T lowerRange, T upperRange) { numericValue = value; lowerRangeValue = lowerRange; upperRangeValue = upperRange; valueSet = true; rangeSet = true; valueUndefined = false; } /** * @return the intValue */ public T getNumericValue() { return numericValue; } /** * @param intValue the intValue to set */ public void setNumericValue(T intValue) { this.numericValue = intValue; } /** * @return the lowerRangeValue */ public T getLowerRangeValue() {

Appendix B



return lowerRangeValue; } /** * @param lowerRangeValue the lowerRangeValue to set */ public void setLowerRangeValue(T lowerRangeValue) { this.lowerRangeValue = lowerRangeValue; } /** * @return the upperRangeValue */ public T getUpperRangeValue() { return upperRangeValue; } /** * @param upperRangeValue the upperRangeValue to set */ public void setUpperRangeValue(T upperRangeValue) { this.upperRangeValue = upperRangeValue; } /** * @return the rangeSet */ public boolean isRangeSet() { return rangeSet; } /** * @param rangeSet the rangeSet to set */ public void setRangeSet(boolean rangeSet) { this.rangeSet = rangeSet; } /** * @return the valueSet */ public boolean isValueSet() { return valueSet; } /** * @param valueSet the valueSet to set */ public void setValueSet(boolean valueSet) { this.valueSet = valueSet; } /** * @return the lowerRangeDependsOn */ public String[] getLowerRangeDependsOn() { return lowerRangeDependsOn; } /** * @param lowerRangeDependsOn the lowerRangeDependsOn to set

Appendix B



*/ public void setLowerRangeDependsOn(String[] lowerRangeDependsOn) { this.lowerRangeDependsOn = lowerRangeDependsOn; } /** * @return the upperRangeDependsOn */ public g[] getUpperRangeDependsOn() { Strin return upperRangeDependsOn; } /** * @param upperRangeDependsOn the upperRangeDependsOn to set */ public void setUpperRangeDependsOn(String[] upperRangeDependsOn) { this.upperRangeDependsOn = upperRangeDependsOn; } /** * @return the valueDependsOn */ public String[] getValueDependsOn() { return valueDependsOn; } /** * @param valueDependsOn the valueDependsOn to set */ public void setValueDependsOn(String[] valueDependsOn) { this.valueDependsOn = valueDependsOn; } /** * @return the valueUndefined */ public boolean isValueUndefined() { return valueUndefined; } /** * @param valueUndefined the valueUndefined to set */ public void setValueUndefined(boolean valueUndefined) { this.valueUndefined = valueUndefined; } }

a conceptual tool for specification based testing€¦ · a conceptual tool for specification based...

Documents