1-s2.0-s0950584910002193-main.pdf

APaulEa

b

cComputer Science Department, Federal University of Bahia, Salvador, BA, Brazil

a r t i c l e i n f o

Article history:Received 9 March 2010Received in revised form 30 November 2010Accepted 4 December 2010Available online 16 December 2010

Keywords:Software product lines

5.1. Search strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4125.2. Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4125.3. Studies selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412

5.3.1. Reliability of inclusion decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4135.4. Quality evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4135.5. Data extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414

6. Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414

Corresponding author at: RiSE - Reuse in Software Engineering, Recife, PE,Brazil.

Information and Software Technology 53 (2011) 407423

Contents lists available at ScienceDirectE-mail address: [email protected] (P.A. da Mota Silveira Neto).Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4082. Related work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4083. Literature review method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4094. Research directives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411

4.1. Protocol definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4114.2. Question structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4114.3. Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411

5. Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412Software testingMapping study0950-5849 2010 Elsevier B.V.doi:10.1016/j.infsof.2010.12.003

Open access under the Elsa b s t r a c t

Context: In software development, Testing is an important mechanism both to identify defects and assurethat completed products work as specied. This is a common practice in single-system development, andcontinues to hold in Software Product Lines (SPL). Even though extensive research has been done in theSPL Testing eld, it is necessary to assess the current state of research and practice, in order to providepractitioners with evidence that enable fostering its further development.Objective: This paper focuses on Testing in SPL and has the following goals: investigate state-of-the-arttesting practices, synthesize available evidence, and identify gaps between required techniques and exist-ing approaches, available in the literature.Method: A systematic mapping study was conducted with a set of nine research questions, in which 120studies, dated from 1993 to 2009, were evaluated.Results: Although several aspects regarding testing have been covered by single-system developmentapproaches, many cannot be directly applied in the SPL context due to specic issues. In addition, partic-ular aspects regarding SPL are not covered by the existing SPL approaches, and when the aspects are cov-ered, the literature just gives brief overviews. This scenario indicates that additional investigation,empirical and practical, should be performed.Conclusion: The results can help to understand the needs in SPL Testing, by identifying points that stillrequire additional investigation, since important aspects regarding particular points of software productlines have not been addressed yet.

2010 Elsevier B.V. Open access under the Elsevier OA license.dComputer Science Department, Clemson University, Clemson, SC, USAduardo Santana de Almeida a,c, Silvio Romero de Lemos Meira a,b

RiSE - Reuse in Software Engineering, Recife, PE, BrazilInformatics Center, Federal University of Pernambuco, Recife, PE, Brazilstematic mapping study of software product lines testing

o Anselmo da Mota Silveira Neto a,b,, Ivan do Carmo Machado a,b, John D. McGregor d,syjournal homepage: www.elsevier .com/locate / infsofInformation and Software Technologyevier OA license.

. . .

. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .

software development techniques [32] and, the same holds true

their

assessvidingHencesynthwhat

ment that may be suitable for SPL.The study also highlights the gaps and identies trends for re-

n anand aggregate the outcomes from relevant studies, thus pro-a balanced and objective summary of the relevant evidence.

ing in software product lines. In order to accomplish that, theauthors classied the primary studies in seven categories, includ-This paper presents a systematic mapping study [67], per-formed in order to map out the SPL Testing eld, through synthe-sizing evidence to suggest important implications for practice, aswell as identifying research trends, open issues, and areas forimprovement. Mapping study [67] is an evidence-based approach,applied in order to provide an overview of a research area, andidentify the quantity and type of research and results availablewithin it. The results are gained from a dened approach to locate,

2004, and highlighted problems to be addressed.A thesis on SPL Testing published in 2007 by Edwin [20], inves-

tigated testing in SPL and possible improvements in testing steps,tools selections and application applied in SPL testing. It was con-ducted using the systematic review approach.

A systematic review was performed by Lamancha et al. [48] andpublished in 2009. Its main goal was to identify experience reportsand initiatives carried out in Software Engineering related to test-effectiveness. uates the state-of-the-art in SPL testing, up to the date of the paper,the currently available techniques, strategies and methods maketesting a very challenging process [46]. Moreover, the SPL Testingeld has attracted the attention of many researchers in the lastyears, which result in a large number of publications regardinggeneral and specic issues. However, the literature has providedlots of approaches, strategies and techniques, but rather surpris-ingly little in the way of widely-known empirical assessment of

as will be discussed later on in this study. Amongst them, we haveidentied some studies developed in order to gather and evaluatethe available evidence in the area. They are thus considered as hav-ing similar ideas to our mapping study and are next described.

A survey on SPL Testing was performed by Tevanlinna et al.[79]. They studied approaches to product line testing methodologyand processes that have been developed for or that can be appliedto SPL, laying emphasis on regression testing. The study also eval-the growing SPL adoption by companies [81], more efcient andeffective testing methods and techniques for SPL are needed, sincein the SPL context [37,79]. From an industry point of view, with As mentioned before, the literature on SPL Testing provides alarge number of studies, regarding both general and specic issues,6.1. Classification scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6.2.1. Testing strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2.2. Static and dynamic analysis. . . . . . . . . . . . . . . . . . . . .6.2.3. Testing levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2.4. Regression testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2.5. Non-functional testing . . . . . . . . . . . . . . . . . . . . . . . . .6.2.6. Commonality and variability testing . . . . . . . . . . . . . .6.2.7. Variant binding time . . . . . . . . . . . . . . . . . . . . . . . . . .6.2.8. Effort reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2.9. Test measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6.3. Analysis of the results and mapping of studies. . . . . . . . . . . .6.3.1. Main findings of the study. . . . . . . . . . . . . . . . . . . . . .

7. Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8. Concluding remarks and future work . . . . . . . . . . . . . . . . . . . . . . . . .

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Appendix A. Quality studies scores . . . . . . . . . . . . . . . . . . . . . . . . . .Appendix B. List of conferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Appendix C. List of journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1. Introduction

The increasing adoption of Software Product Lines practices inindustry has yielded decreased implementation costs, reducedtime to market and improved quality of derived products[17,63]. In this approach, as in single-system development, testingis essential [36] to uncover defects [68,75]. A systematic testingapproach can save signicant development effort, increase prod-uct quality and, customer satisfaction and lower maintenancecosts [32].

As dened in [54], testing in SPL aims to examine core assets,shared by many products derived from a product line, their indi-vidual parts and the interaction among them. Thus, testing in thiscontext encompasses activities from the validation of the initialrequirements to activities performed by customers to completethe acceptance of a product, and conrms that testing is still themost effective method of quality assurance, as observed in [46].

However, despite the obvious benets aforementioned, thestate of software testing practice is not as advanced in general as

408 P.A. da Mota Silveira Neto et al. / Informatio, the goal of this investigation is to identify, evaluate, andesize state-of-the-art testing practices in order to presenthas been achieved so far in this discipline. We are also inter-search and development. Moreover, it is based on analysis of inter-esting issues, guided by a set of research questions. This systematicmapping process was conducted from July to December in 2009.

The remainder of this paper is organized as follows: Section 2presents the related work. In Section 3 the method used in thisstudy is described. Section 4 presents the planning phase andthe research questions addressed by this study. Section 5 de-scribes its execution, presenting the search strategy used andthe resultant selected studies. Section 6 presents the classicationscheme adopted in this study and reports the ndings. In Section 7the threats to validity are described. Section 8 draws some conclu-sions and provides recommendations for further research on thistopic.

2. Related work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421

ested in identifying practices adopted in single systems develop-

d Software Technology 53 (2011) 407423ing: Unit testing, Integration testing, functional testing, SPL Archi-tecture, Embedded system, testing process and testing effort in SPL.After that a summary of each area was presented.

cess

n anFig. 1. The systematic mapping pro

P.A. da Mota Silveira Neto et al. / InformatioThese studies can be considered good sources of informationon this subject. In order to develop our work, we consideredevery mentioned study, since they bring relevant information.However, we have noticed that important aspects, such asregression testing, testing of non-functional requirements andthe relation between variant binding time and testability, werenot covered by them in an extent that should be possible tomap out the current status of research and practice of the area.Thus, we categorized a set of important research areas underSPL testing, focusing on aspects addressed by the studies men-tioned before as well as the areas they did not addressed, butare directly related to SPL practices, in order to perform criticalanalysis and appraisal. In order to accomplish our goals in thiswork, we followed the guidelines for mapping studies develop-ment presented in [12]. We also included threats mitigationstrategies in order to have the most reliable results.

We believe our study states current and relevant information onresearch topics that can complement others previously published.By current, we mean that, as the number of studies published has

Fig. 2. Stages of the s(adapted from Petersen et al. [67]).d Software Technology 53 (2011) 407423 409increased rapidly, as shown in Fig. 4, it justies the need of moreup to date empirical research in this area to contribute to the com-munity investigations.

3. Literature review method

The method used in this research is a Systematic Mapping Study(henceforth abbreviated to as MS) [12,67]. A MS provides a sys-tematic and objective procedure for identifying the nature and ex-tent of the empirical study data that is available to answer aparticular research question [12].

While a Systematic Review is a mean of identifying, evaluatingand interpreting all available research relevant to a particularquestion [41], a MS intends to map out the research undertakenrather than to answer detailed research question [12,67]. A well-organized set of good practices and procedures for undertakingMS in the software engineering context is dened in [12,67], whichestablishes the base for the study presented in this paper. It is

election process.

n an410 P.A. da Mota Silveira Neto et al. / Informatioworthwhile to highlight that the importance and use of MS in thesoftware engineering area is increasing [1,5,12,15,33,40,67,71],showing the relevance and potential of the method. Nevertheless,of the same way as systematic reviews [10,13,51,56,78], we needmore MS related to software product lines, in order to evolve theeld with more evidence [43].

A MS comprises the analysis of primary studies that investigateaspects related predened research questions, aiming at integrat-ing and synthesizing evidence to support or refute particular re-search hypotheses. The main reasons to perform a MS can bestated as follows, as dened by Budgen et al. [12]:

To make an unbiased assessment of as many studies as possible,identifying existing gaps in current research and contributing tothe research community with the reliable synthesis of the data;

To provide a systematic procedure for identifying the natureand extent of the empirical study data that is available toanswer research questions;

To map out the research that has been undertaken;

Fig. 3. Primary studies lterin

Fig. 4. Distribution of primary studd Software Technology 53 (2011) 407423 To help to plan new research, avoiding unnecessary duplicationof effort and error;

To identify gaps and clusters in a set of primary studies, in orderto identify topics and areas to perform more complete system-atic reviews.

The experimental software engineering community is workingtowards the denition of standard processes for conducting map-ping studies. This effort can be checked out in Petersen et al.[67], a study describing how to conduct systematic mapping stud-ies in software engineering. The paper provide a well dened pro-cess which serves as a starting point for our work. We mergedideas from Petersen et al. [67] with good practices dened in theguidelines published by Kitchenham and Charters [41]. This way,we could apply a process for mapping study including good prac-tices of conducting systematic reviews, making better use of theboth techniques.

This blending process enabled us to include topics not coveredby Petersen et al. [67] in their study, such as:

g categorized by source.

ies by their publication years.

review questions, study quality criteria and classication

n anscheme. Quality criteria. The purpose of quality criteria is to evaluate thestudies, as a means of weighting their relevance against others.Quality criteria are commonly used when performing system-atic literature reviews. The quality criteria were evaluated inde-pendently by two researchers, hopefully reducing the likelihoodof erroneous results.

Some elements, as proposed by Petersen et al. [67] were alsochanged and/or rearranged in this study, such as:

Phasing mapping study. As can be seen in Fig. 1, the process wasexplicitly split into three main phases: 1 Research directives, 2 Data collection and 3 Results. It is in line with systematicreviews practices [41], which denes planning, conducting andreporting phases. Phases are named differently from what isdened for systematic reviews, but the general idea and objec-tive for each phase was followed. In the rst, the protocol andthe research questions are established. This is the most impor-tant phase, since the research goal is satised with answers tothese questions. The second phase comprises the execution ofthe MS, in which the search for primary studies is performed.This consider a set of inclusion and exclusion criteria, used inorder to select studies that may contain relevant results accord-ing to the goals of the research. In third phase, the classicationscheme is developed. This was built considering two facets, inwhich one structured the topic in terms of the research ques-tions, and other considered different research types as denedin [67]. The results of a meticulous analysis performed withevery selected primary study is reported, in a form of a mappingstudy. All phases are detailed in next sections.

4. Research directives

This section presents the rst phase of the mapping study pro-cess, in which the protocol and research questions are dened.

4.1. Protocol denition

The protocol forms the research plan for an empirical study, andis an important resource for anyone who is planning to undertake astudy or considering performing any form of replication study.

In this study, the purpose of the protocol is to guide the researchobjectives and clearly dene how it should be performed, throughdening research questions and planning how the sources andstudies selected will be used to answer those questions. Moreover,the classication scheme to be adopted in this study was prior de-ned and documented in the protocol.

Incremental reviews to the protocol were performed in accor- Protocol. This artifact was adopted from systematic reviewguidelines. Our initial activity in this study was to develop aprotocol, i.e. a plan dening the basic mapping study proce-dures. Searching in the literature, we noticed that some studiescreated a protocol (e.g. [2]), but others do not (e.g. [15,67]).Even though this is not a mandatory artifact, as mentioned byPetersen et al. [67], authors who created a protocol in theirstudies encourage the use this artifact as being important toevaluate and calibrate the mapping study process.

Collection form. This artifact was also adopted from systematicreview guidelines and its main purpose is to help the research-ers in order to collect all the information needed to address the

P.A. da Mota Silveira Neto et al. / Informatiodance with the MS method. The protocol was revisited in orderto update it based on new information collected as the studyprogressed.To avoid duplication, we detail the content of the protocol in theSection 5, as we describe how the study was conducted.

4.2. Question structure

The research questions were framed by three criteria:

Population. Published scientic literature reporting softwaretesting and SPL testing.

Intervention. Empirical studies involving SPL Testing practices,techniques, methods and processes.

Outcomes. Type and quantity of evidence relating to various SPLtesting approaches, in order to identify practices, activities andresearch issues concerning to this area.

4.3. Research questions

As previously stated, the objective of this study is to under-stand, characterize and summarize evidence, identifying activities,practical and research issues regarding research directions in SPLTesting. We focused on identifying how the existing approachesdeal with testing in SPL. In order to dene the research questions,our efforts were based on topics addressed by previous research onSPL testing [20,46,79]. In addition, the research questions deni-tion task was aided by discussions with expert researchers andpractitioners, in order to encompass relevant and still open issues.

Nine research questions were derived from the objective of thestudy. Answering these questions led a detailed investigation ofpractices arising from the identied approaches, which supportboth industrial and academic activities. The research questions,and the rationale for their inclusion, are detailed below.

Q1. Which testing strategies are adopted by the SPL Testingapproaches? This question is intended to identify the testingstrategies adopted by a software product line approach [79].By strategy, we mean understanding when assets are tested,considering the differentiation between the two SPL develop-ment processes: core asset and product development.

Q2. What are the existing static and dynamic analysis techniquesapplied to the SPL context? This question is intended to identifythe analysis type (static and dynamic testing [54]) applied alongthe software development life cycle.

Q3. Which testing levels commonly applicable in single-systemsdevelopment are also used in the SPL approaches? Ammann andOffutt [4] and Jaring et al. [29] advocate different levels of test-ing (unit, integration, system and acceptance tests) where eachlevel is associated with a development phase, emphasizingdevelopment and testing equally.

Q4. How do the product line approaches handle regression testingalong software product line life cycle? Regression testing is donewhen changes are made to already tested artifacts [36,76].Regression tests often are automated since test cases relatedto the core assets may be repeated every time a new productis derived [63]. Thus, this question investigates the regressiontechniques applied to SPL.

Q5. How do the SPL approaches deal with tests of non-functionalrequirements? This question seeks clarication on how tests ofnon-functional requirements should be handled.

Q6. How do the testing approaches in an SPL organization handlecommonality and variability? An undiscovered defect in the com-mon core assets of a SPL will affect all applications and thus willhave a severe effect on the overall quality of the SPL [68]. In thissense, answering this question requires an investigation into

d Software Technology 53 (2011) 407423 411how the testing approaches handle commonality issues throughthe software life cycle, as well as gathering information on howvariability affects testability.

The lists of Conferences and Journals used in the search for pri-mary studies are available in Appendices B and C.

After performing the search for publications in conferences,journals, using digital libraries and proceedings, we noticed thatknown publications, commonly referenced by other studies in thiseld, such as important technical reports and thesis, had not beenincluded in our results list. We thus decided to include these greyliterature entries. Grey literature is used to describe materials notpublished commercially or indexed by major databases.

5.3. Studies selection

The set of search strings was thus applied within the search en-gines, specically in those mentioned in the previous section. Thestudies selection involved a screening process composed of threelters, in order to select the most suitable results, since the likeli-hood of retrieving not adequate studies might be high. Fig. 2 brieydescribes what was considered in each lter. Moreover, the Fig-ure depicts the amount of studies remaining after applying eachlter.

The inclusion criteria were used to select all studies during thesearch step. After that, the same exclusion criteria was rstly ap-plied in the studies title and after in the abstracts and conclusions.Regarding the inclusion criteria, the studies were included if theyinvolved:

SPL)12 Test automation AND (product line OR product family OR SPL)

412 P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407423 Q7. How do variant binding times affect SPL testability? Accordingto [29], variant binding time determines whether a test can beperformed at a given development or deployment phase. Thus,the identication and analysis of the suitable moment to bind avariant determines the appropriate testing technique to handlethe specic variant.

Q8. How do the SPL approaches deal with test effort reduction? Theobjective is to analyze within selected approaches the mostsuitable ways to achieve effort reduction, as well as to under-stand how they can be accomplished within the testing levels.

Q9. Do the approaches dene any measures to evaluate the testingactivities? This question requires an investigation into the datacollected by the various SPL approaches with respect to testingactivities.

5. Data collection

In order to answer the research questions, data was collectedfrom the research literature. These activities involved developinga search strategy, identifying data sources, selecting studies to ana-lyze, and data analysis and synthesis.

5.1. Search strategy

The search strategy was developed by reviewing the dataneeded to answer each of the research questions.

The initial set of keywords was rened after a preliminarysearch returned too many results with few relevance. We used sev-eral combinations of search items until we had achieved a suitableset of keywords. These are: Verication, Validation; Product Line,Product Family; Static Analysis, Dynamic Analysis; Variability,Commonality, Binding; Test Level; Test Effort, Test Measure; Non-func-tional Testing; Regression Testing, Test Automation, Testing Frame-work, Performance, Security, Evaluation, Validation, as well as theirsimilar nouns and syntactic variations (e.g. plural form). All termswere combined with the term Product Line and Product Familyby using Boolean AND operator. They all were joined each otherby using OR operator so that it could improve the completenessof the results. The complete list of search strings is available in Ta-ble 1 and also in a website developed to show detailed informationon this MS.1

5.2. Data sources

The search included important journals and conferences regard-ing the research topic such as Software Engineering, SPL, SoftwareVerication, Validation and Testing and Software Quality. The searchwas also performed using the snow-balling process, following upthe references in papers and it was extended to include grey liter-ature sources, seeking relevant white papers, industrial (and tech-nical) reports, thesis, work-in-progress, and books.

We restricted the search to studies published up to December2009. We indeed did not establish an inferior year-limit, sinceour intention was to have a broader coverage of this research eld.This was decided due to many important issues that emerged tenor more years ago are still considered open issues, as pointed outin [7,31].

The initial step was to perform a search using the terms de-scribed in Subsection 5.1, at the digital libraries web search en-gines. We considered publications retrieved from ScienceDirect,SCOPUS, IEEE Xplore, ACM Digital Library and Springer Link tools.

The second step was to search within top international, peer-re-viewed journals published by Elsevier, IEEE, ACM and Springer,1 http://www.cin.ufpe.br/sople/testing/ms/since they are considered the world leading publishers for highquality publications [11].

Next, conference proceedings were also searched. In caseswhich the conference keep the proceedings in a website, makingthem available, we accessed the website. When proceedings werenot available by the conference website, the search was donethrough DBLP Computer Science Bibliography. 2

When searching conference proceedings and journals, manywere the results that had already been found in the search throughdigital libraries. In this case, we discarded the last results, consid-ering only the rst, that had already been included in our resultslist.

13 Regression test AND (product line OR product family OR SPL)14 Non-functional test AND (product line OR product family OR

SPL)15 Measure AND test AND (product line OR product family OR SPL)16 Testing framework AND (product line OR product family OR SPL)17 Performance OR security AND (product line OR product family OR

SPL)18 Evaluation OR validation AND (product line OR product family OR

SPL)Table 1List of research strings.

Research strings

1 Verication AND validation AND (product line OR product family ORSPL)

2 Static analysis AND (product line OR product family OR SPL)3 Dynamic testing AND (product line OR product family OR SPL)4 Dynamic analysis AND (product line OR product family OR SPL)5 Test AND level AND (product line OR product family OR SPL)6 Variability OR commonality AND testing7 Variability AND commonality AND testing AND (product line OR

product family OR SPL)8 Binding AND test AND (product line OR product family OR SPL)9 Test AND effort reduction AND (product line OR product family OR

SPL)10 Test effort AND (product line OR product family OR SPL)11 Test effort reduction AND (product line OR product family OR2 http://www.informatik.uni-trier.de/ley/db/

of the primary studies by principles of good practice for conducting

f stu

P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407423 413 SPL approaches which address testing concerns. Approaches thatinclude information on methods and techniques and how theyare handled and, how variabilities and commonalities inuencesoftware testability.

SPL testing approaches which address static and dynamic analysis.Approaches that explicitly describe how static and dynamictesting applies to different testing phases.

SPL testing approaches which address software testing effort con-cerns. Approaches that describe the existence of automatedtools as well as other strategies used in order to reduce testeffort, and metrics applied in this context.

Studies were excluded if they involved:

SPL approaches with insufcient information on testing. Studiesthat do not have detailed information on how they handle SPLtesting concepts and activities.

Duplicated studies. When the same study was published in dif-ferent papers, the most recent was included.

Or if the study had already been included from another source.

Fig. 3 depicts a Bar Chart with the results categorized by sourceand lter, as described in Section 5.2. Fig. 4 shows the distributionof the primary studies, considering the publication year. This Fig-

Fig. 5. Amount oure briey gives us the impression that the SPL Testing area isbecoming more interesting, whereas the growing number of pub-lications claims the trend that many solutions have become re-cently available (disregarding 2009, since many studies mightnot be made available by search engines until the time the searchwas performed, and thus we did not consider in this study).

An important point to highlight is that, between 2004 and 2008an important international workshop devoted specically to SPLtesting, the SPLiT workshop,3 demonstrated the interest of the re-search community on expanding this eld. Fig. 5 shows the amountof publications considering their sources. In fact, it can be seen thatpeaks in Fig. 4 match with the years when this workshop occurred.All the studies are listed in Appendix A.

5.3.1. Reliability of inclusion decisionsThe reliability of decisions to include a study is ensured by hav-

ing multiple researchers to evaluate each study. The study wasconducted by two research assistants (the two rst authors) whowere responsible for performing the searches and summarizingthe results of the mapping study, with other members of the team

3 c.f. http://www.biglever.com/split2008/empirical research in software engineering [41], tailoring the ideaof assessing studies by a set of criteria to our specic context.

Thus, the quality criteria for this evaluation is presented inTable 2. Criteria grouped as A covered a set of issues pertainingto quality that need to be considered when appraising the studiesacting as reviewers. A high-level agreement existed before thestudy was included. In case the researchers did not agree after dis-cussion, an expert in the area was contacted to discuss and giveappropriate guidance.

5.4. Quality evaluation

In addition to general inclusion/exclusion criteria, the qualityevaluation mechanism, usually applied in systematic reviews[18,19,44], was applied in this study in order to assess the trust-worthiness of the primary studies. This assessment is necessaryto limit bias in conducting this empirical study, to gain insight intopotential comparisons, and to guide interpretation of ndings.

The quality criteria we used served as a means of weighting theimportance of individual studies, enhancing our understanding,and developing more condence in the analysis.

As mapping study guidelines [67] does not establish a formalevaluation in the sense of quality criteria, we chose to assess each

dies vs. sources.identied in the review, according to [42]. Groups B and C assessthe quality considering SPL Testing concerns. The former wasfocused on identifying how well the studies address testing issues

Table 2Quality criteria.

Group ID Quality criteria

A 1 Are there any roles described?2 Are there any guideline described?3 Are there inputs and outputs described?4 Does it detail the test artifacts?

B 5 Does it detail the validation phase?6 Does it detail the verication phase?7 Does it deal with Testing in Requirements phase?8 Does it deal with Testing in Architectural phase?9 Does it deal with Testing in Implementation phase?10 Does it deal with Testing in Deployment phase?

C 11 Does it deal with binding time?12 Does it deal with variability testing?13 Does it deal with commonality testing?14 Does it deal with effort reduction?15 Does it deal with non-functional tests?16 Does it deal with any test measure?

n anFig. 6. Distribution of papers according to classication scheme.414 P.A. da Mota Silveira Neto et al. / Informatioalong the SPL development life cycle which is usually composed ingeneral by scoping, requirement, design, implementation and test-ing phases. The latter evaluated how well our research questionswere addressed by individual studies. This way a better qualityscore matched studies which covered the larger amount ofquestions.

The main purpose of this grouping is justied by the difcultyfaced in establishing a reliable relationship between nal qualityscore and the real quality of each study. Some primary studies(e.g. one which addresses some issue in a very detailed way) arereferenced in several other primary studies, but if we apply thecomplete quality criteria items, the nal score is lower than otherswhich do not have the same relevance. This way, we intended tohave a more valid and reliable quality assessment instrument.

Each of the 45 studies was assessed independently by theresearchers according to the 16 criteria shown in Table 2. Taken to-gether, these criteria provided a measure of the extent to which wecould be condent that a particular study could give a valuablecontribution to the mapping study. Each of the studies was gradedon a trichotomous (yes,partly or no) scale and tagged 1, 0.5 and 0.We did not use the grade to serve as a threshold for the inclusiondecision, but rather to identify the primary studies that wouldform a valid foundation for our study. We note that, overall, thequality of the studies was good. It is possible to check every gradein Appendix A, where the most relevant are highlighted.

5.5. Data extraction

The data extraction forms must be designed to collect all theinformation needed to address the research questions and the

Testing product by product: This approach ignores the possibility

Fig. 7. Distribution of papers according to intervention.of reuse benets. This approach offers the best guarantee ofproduct quality but is extremely costly. In [30], a similarapproach is presented, named as pure application strategy, inwhich testing is performed only for a concrete product in theproduct development. No test is performed in the core assetdevelopment. Moreover, in this strategy, tests for each derivedapplication are developed independently from each other,which results in an extremely high test effort, as pointed outquality criteria. The following information was extracted from eachstudy: title and authors; source: conference/journal; publication year;the answers for research questions addressed by the study; summary:a brief overview on its strengths and weak points; quality criteria scoreaccording to the Table 2; reviewer name; and the date of the review.

At the beginning of the study, we decided that when severalstudies were reported in the same paper, each relevant studywas treated separately. Although, this situation did not occur.

6. Outcomes

In this section, we describe the classication scheme and the re-sults of data extraction. When having the classication scheme inplace, the relevant studies are sorted into the scheme, which isthe real data extraction process. The results of this process is themapping of studies, as presented at the end of this section, togetherwith concluding remarks.

6.1. Classication scheme

We decided to use the idea of categorizing studies in facets, asdescribed by Petersen et al. [67], since we considered this as astructured way of doing such a task. Our classication schemeassembled two facets. One facet structured the topic in terms ofthe research questions we dened. The other considered the typeof research.

In the second, our study used the classication of research ap-proaches described by Wieringa et al. [82]. According to Petersenet al. [67], which also used this approach, the research facet whichreects the research approach used in the papers is general andindependent from a specic focus area. The classes that form theresearch facet are described in Table 3.

The classication was performed after applying the lteringprocess, i.e. only the nal set of studies was classied and are con-sidered. The results of the classication is presented at the end ofthis section (Fig. 8).

6.2. Results

In this sub-section, each topic presents the ndings of a sub-re-search question, highlighting evidences gathered from the dataextraction process. These results populate the classicationscheme, which evolves while doing the data extraction.

6.2.1. Testing strategyByanalyzing theprimary studies,wehave foundawidevariety of

testing strategies. Tevanlinna and Reuys, respectively [75] and [79]present a similar set of strategies to SPL testing development, thatare applicable to any development effort since the descriptions ofthe strategies are generic. We herein use the titles of the topics theyoutlined, after making some adjustments, as a structure for aggre-gating other studies which use a similar approach, as follows:

d Software Technology 53 (2011) 407423by [75]. This testing strategy is similar to the test in single-prod-uct engineering, because without reuse the same test effort isrequired for each new application.

leme

f thet arerobler noorby s

ther

n do

n anTable 3Research type facet.

Classes Description

Validationresearch

Techniques investigated are novel and have not yet been impthe lab

Evaluationresearch

Techniques are implemented in practice and an evaluation oimplemented in practice (solution implementation) and wha(implementation evaluation). This also includes to identify p

Solution proposal A solution for a problem is proposed, the solution can be eiththe applicability of the solution is shown by a small example

PhilosophicalPapers

These papers sketch a new way of looking at existing things

Opinion papers These papers express the personal opinion of somebody wherely on related work and research methodologies

ExperiencePapers

Experience papers explain what and how something has bee

P.A. da Mota Silveira Neto et al. / Informatio Incremental testing of product lines: The rst product is testedindividually and the following products are tested using regres-sion testing techniques [26,76]. Regression testing focuses onensuring that everything used to work still works, i.e. the prod-uct features previously tested are re-tested through a regressiontechnique.

Opportunistic reuse of test assets: This strategy is applied to reuseapplication test assets. Assets for one application are developed.Then, the application derived from the product line use theassets developed for the rst application. This form of reuse isnot performed systematically, which means that there is nomethod that supports the activity of selecting the test assets[75].

Design test assets for reuse: Test assets are created as early aspossible in domain engineering. Domain test aims at testingcommon parts and preparing for testing variable parts [30]. Inapplication engineering, these test assets are reused, extendedand rened to test specic applications [30,75]. Generalapproaches to achieve core assets reuse are: repository, coreassets certication, and partial integration [84]. Kishi and Noda[39] state that a verication model can be shared among appli-cations that have similarities. The SPL principle design for reuseis fully addressed by this strategy, which can enable the overallgoals of reducing cost, shortening time-to-market, and increas-ing quality [75].

Fig. 8. Visualization of a systematic mnted in practice. Techniques used are for example experiments, i.e., work done in

technique is conducted. That means, it is shown how the technique isthe consequences of the implementation in terms of benets and drawbacks

ems in industryvel or a signicant extension of an existing technique. The potential benets anda good line of argumentationtructuring the eld in form of a taxonomy or conceptual framework

a certain technique is good or bad, or how things should been done. They do not

ne in practice. It has to be the personal experience of the authord Software Technology 53 (2011) 407423 415 Division of responsibilities: This strategy relates to select testinglevels to be applied in both domain and application engineering,depending upon the objective of each phase, i.e. whether think-ing about developing for or with reuse [79]. This division can beclearly seen when the assets are unit tested in domain engineer-ing and, when instantiated in application engineering, integra-tion, system and acceptance testing are performed.

As SPL Testing should be a reuse-based test derivation for testingproducts within a product line [84], the Testing product by productand Opportunistic reuse of test assets strategies cannot be consideredeffective for the SPL context, since the rst does not consider the re-use benets which results in costs of testing resembling single-sys-tems development. In the second, no method is applied, hence, theactivity may not be repeatable, and may not avoid the redundantre-execution of test cases, which can thus increase costs.

These strategies can be considered a feasible grouping of whatstudies on SPL testing approaches have been addressing, whichcan show us a more generic view on the topic.

6.2.2. Static and dynamic analysisAn effective quality strategy for a software product line requires

both static and dynamic analysis techniques. Techniques for staticanalysis are often dismissed as more expensive, but in a softwareproduct line, the cost of static analysis can be amortized overmulti-ple products.

ap in the form of a bubble plot.

n anA number of studies advocate the use of inspections andwalkthroughs [29,54,79] and formal verication techniques, asstatic analysis techniques/methods for SPL, to be conducted priorto dynamic analysis, i.e. with the presence of executable code.[54] presents an approach for Guided Inspection, aimed at apply-ing the discipline of testing to the review of non-software assets.In [39], a model checker is dened that focuses on design veri-cation instead of code verication. This strategy is consideredeffective because many defects are injected during the designphase [39].

Regarding dynamic analysis, some studies [29,47] recommendthe V-model phases, commonly used in single-systems, to struc-ture a series of dynamic analysis. The V-model gives equal weightto development and testing rather than treating testing as an after-thought [25]. However, despite the well-dened test process pre-sented by V-model, its use in SPL context requires someadaptation, as applied in [29].

The relative amount of dynamic and static analysis depends onboth technical and managerial strategies. Technically, series of fac-tors such as test-rst development or model-based developmentdetermine the focus. Model-based development emphasizes staticanalysis of models while test-rst development emphasizes dy-namic analysis. Managerial strategies such as reduced time to mar-ket, lower cost and improved product quality determine the depthto which analysis should be carried.

6.2.3. Testing levelsSome of the analyzed studies (e.g. [29,47]) divide SPL testing

according to the two primary software product line activities: coreasset and product development.

Core asset development: Some testing activities are related tothe development of test assets and test execution to be per-formed to evaluate the quality of the assets, which will be fur-ther instantiated in the application engineering phase. The twobasic activities include developing test artifacts that can be re-used efciently during application engineering and applying teststo the other assets created during domain engineering [34,70].Regarding types of testing, the following are performed in do-main engineering:

Unit testing: Testing of the smallest unit of software implemen-tation. This unit can be basically a class, or even a module, afunction, or a software component. The granularity leveldepends on the strategy adopted. The purpose of unit testingis to determine whether this basic element performs asrequired through verication of the code produced during thecoding phase.

Integration testing: This testing is applied as the modules areintegrated with each other or within the reference in domain-level V&V when the architecture calls for specic domain com-ponents to be integrated in multiple systems. This type oftesting is also performed during application engineering [55].Li et. al. [49] present an approach for generating integrationtests from unit tests.

Product development: Activities here are related to the selectionand instantiation of assets to build specic product test assets,design additional product specic tests, and execute tests. The fol-lowing types of testing canbe performed in application engineering:

System testing: System testing ensures that the nal productmatches the required features [61]. According to [24], systemtesting evaluates the features and functions of an entire product

416 P.A. da Mota Silveira Neto et al. / Informatioand validates that the system works the way the user expects. Aform of system testing can be carried out on the software archi-tecture using a static analysis approach.testing each time a modication is made.Kolb [45] highlights that the major problems in a SPL context

are the large number of variations and their combinations, redun-dant work, the interplay between generic components and prod-uct-specic components, and regression testing.

Jin-hua et al. [30] emphasize the importance of regression test-ing when a component or a related component cluster are changed,saying that regression testing is crucial to perform on the applica-tion architecture, which aims to evaluate the application architec-ture with its specication. Some researchers also developedapproaches to evaluate architecture-based software by usingregression testing [27,58,59].

6.2.5. Non-functional testingNon-functional issues have a great impact on the architecture

design, where predictability of the non-functional characteristicsof any application derived from the SPL is crucial for any re-source-constrained product. These characteristics are well-knownquality attributes, such as response time, performance, availability,and scalability, that might differ in instances of a product line.According to [23], testing non-functional quality attributes isequally important as functional testing.

By analyzing the studies, it was noticed that some of them pro-pose the creation or execution of non-functional tests. Reis andMetzger [72] presents a technique to support the development ofthe modication.

He also highlights the importance of regression test selectiontechniques and the automation of the regression execution.

Kauppinen and Taina [37] advocate that the testing processshould be iterative, and based on test execution results, new testcases should be generated and tests scripts may be updated duringa modication. These test cases are repeated during regression Acceptance testing: Acceptance testing is conducted by the cus-tomer but often the developing organization will create andexecute a preliminary set of acceptance tests. In a softwareproduct line organization, commonality among the tests neededfor the various products is leveraged to reduce costs.

A similar division is stated by Wieringa et al. [55], in which theauthor denes two separated test processes used in product lineorganization, Core Asset Testing and Product Testing.

Some authors [64,75,83] also include system testing in core assetdevelopment. The rationale for including such a level is to produceabstract test assets to be further reused and adapted when derivingproducts in the product development phase.

6.2.4. Regression testingEven though regression testing techniques have been re-

searched for many years, as stated in [21,26,76], no study gives evi-dence on regression testing practices applied to SPL. Someinformation is presented by a few studies [46,57], where just abrief overview on the importance of regression testing is given,but they do not take into account the issues specic to SPLs.

McGregor [54] reports that when a core asset is modied due toevolution or correction, they are tested using a blend of regressiontesting and development testing. According to him, the modiedportion of the asset should be exercised using:

Existing functional tests if the specication of the asset has notchanged;

If the specications has changed, new functional tests are cre-ated and executed; and

Structural tests created to cover the new code created during

d Software Technology 53 (2011) 407423reusable performance test scenarios to be further reused in appli-cation engineering. Feng et al. [22] highlight the importance ofnon-functional concerns (performance, reliability, dependability,

n anetc.). Ganesan et al. [23] describe a work intended to develop anenvironment for testing the response time and load of a productline, however due to the constrained experimental environmentthere was no visible performance degradation observed.

In single-system development, different non-functional testingtechniques are applicable for different types of testing, the samemight hold for SPL, but no experience reports were found to sup-port this statement.

6.2.6. Commonality and variability testingCommonality, as an inherent concept in the SPL theory, is nat-

urally addressed by many studies, such as stated by Pohl et al.[70], in which the major task of domain testing is the developmentof common test artifacts to be further reused in application testing.

The increasing size and complexity of applications can result ina higher number of variation points and variants, which makestesting all combinations of the functionality almost impossible inpractice. Managing variability and testability is a trade-off. Thelarge amount of variability in a product line increases the numberof possible testing combinations. Thus, testing techniques that con-sider variability issues and thus reduce effort are required.

Cohen et al. [14] introduce cumulative variability coverage,which accumulates coverage information through a series of devel-opment activities, to be further exploited in a target testing activ-ities for product line instances.

Another solution, proposed by Kolb and Muthig [47], is theimposition of constraints in the architecture. Instead of havingcomponents with large amount of variability it is better for test-ability to separate commonalities and variabilities and encapsulatevariabilities as subcomponents. Aiming to reduce the retest ofcomponents and products when modications are performed,independence of feature and components, as well as the reductionof side effects, reduce the effort required for adequate testing.

Tevanlinna et al. [79] highlight the importance of asset trace-ability from requirements to implementation. There are some waysto achieve this traceability between test assets and implementa-tion, as reported by McGregor et al. [52], in which the design ofeach product line test asset matches the variation implementationmechanism for a component.

The selected approaches handle variability in a range of differentmanners, usually expliciting variability as early as possible in UMLuse cases [28,35,77] that will further be used to design test cases,as described in the requirement-based approaches [8,60].Moreover,model-based approaches introduce variability into testmodels, cre-ated through use cases and their scenarios [74,75], and specifyingvariability into feature models and activity diagrams [64,66]. Theyare usually concerned about reusing test case in a systematic man-ner through variability handling as [3,83] report.

6.2.7. Variant binding timeAccording to [52], the binding of different variants requires dif-

ferent binding time (Compile Time, Link Time, Execution Time andPost-Execution Time), which requires different mechanisms (e.g.inheritance, parameterization, overloading and conditional compi-lation). They are suitable for different variability implementationschemes. The different mechanisms result in different types of de-fects, test strategies, and test processes.

This issue is also addressed by Jaring et al. [29], in their Variabilityand Testability InteractionModel,which is responsible formodelingthe interaction between variability binding and testability in thecontext of the V-model. The decision regarding the best momentto test a variant is clearly important. The earliest point at which adecision is bound is the point at which the binding should be tested.

P.A. da Mota Silveira Neto et al. / InformatioIn our ndings, the approach presented in [75] deals with test-ing variant binding time as a form of ensuring that the applicationcomprises the correct set of features, as the customer looks for-ward. After performing the traditional test phases in applicationengineering, the approach suggests tests to be performed towardsverifying if the application contains the set of functionalities re-quired, and nothing else.

6.2.8. Effort reductionSome authors consider testing the bottleneck in SPL, since the

cost of testing product lines is becoming more costly than testingsingle systems [45,47]. Although applications in a SPL share com-mon components, they must be tested individually in system test-ing level. This high cost makes testing an attractive target forimprovements [63]. Test effort reduction strategies can have sig-nicant impact on productivity and protability [53]. We foundsome strategies regarding this issue. They are described as follows:

Reuse of test assets: Test assets mainly test cases, test scenariosand test results [53] are created to be reusable, which conse-quently impacts the effort reduction. According to [37,84], anapproach to achieve the reuse of core assets comes from the exis-tence of an asset repository. It usually requires an initial testingeffort for its construction, but throughout the process, these assetsdo not need to be rebuilt, they can be rather used as is. Anotherstrategy considers the creation of test assets as extensively as pos-sible in domain engineering, anticipating also the variabilities bycreating documents templates and abstract test cases. Test casesand other concrete assets are used as is and the abstract ones areextended or rened to test the product-specic aspects in applica-tion engineering. In [50], amethod formonitoring the interfaces ofevery component during test execution is proposed, observingcommonality issues in order to avoid repetitive execution. Asmentioned before in Section 6.2.6, the systematic reuse of testassets, especially test cases, are the focus of many studies, eachofferingnovel and/or extended approaches. The reason for dealingwith assets reuse ina systematicmanner is that it canenable effortreduction, since redundant work may be avoided when derivingmany products from the product line. In this context, the searchfor an effective approach has been noticed throughout the pastrecent years, as canbe seen in [53,55,61,66,75].Hence, it is feasibleto infer that there isnota general solution fordealingwith system-atic reuse in SPL testing yet.

Test automation tools: Automatic testing tools to support testingactivities [16] is a way to achieve effort reduction. Methodshave been proposed to automatically generate test cases fromsingle system models expecting to reduce testing effort[28,49,60], such as mapping the models of an SPL to functionaltest cases in order to automatically generate and select func-tional test cases for an application derived [65]. Automatic testexecution is an activity that should be carefully managed toavoid false failures since unanticipated or unreported changescan occur in the component under test. These changes shouldbe reected in the corresponding automated tests [16].

6.2.9. Test measurementTest measurement is an important activity applied in order to

calibrate and adjust approaches. Adequacy of testing can be mea-sured based on the concept of a coverage criterion. Metrics relatedto test coverage are applied to extract information, and are usefulfor the whole project. We investigated how test coverage has beenapplied by existing approaches regarding SPL issues.

According to [79], there is only one way to completely guaranteethat a program is fault-free, to execute it on all possible inputs,which is usually impossible or at least impractical. It is even moredifcult if the variations andall their constraints are considered. Test

d Software Technology 53 (2011) 407423 417coverage criteria are a way to measure how completely a test suiteexercises the capabilities of a piece of software. These measurescan be used to dene the space of inputs to a program. It is possible

n anto systematically sample this space and test only a portionof the fea-sible system behavior [14]. The use of covering arrays as a test cov-erage strategy is addressed in [14]. Kauppinen and Tevanlinna [38]dene coverage criteria for estimating the adequacy of testing in aSPL context. They propose two coverage criteria for framework-based product lines: hook and template coverage, that is, variationpoints open for customization in a framework are implemented ashook classes and stable parts as template classes. They are used tomeasure the coverage of frameworks or other collections of classesin an application by counting the structures or hook method refer-ences from them instead of single methods or classes.

6.3. Analysis of the results and mapping of studies

The analysis of the results enables us to present the amount ofstudies that match each category addressed in this study. It makesit possible to identify what have been emphasized in past researchand thus to identify gaps and possibilities for future research [67].

Initially, let us analyze the distribution of studies regarding ouranalysis point of view. Figs. 6 and 7, that present respectively thefrequencies of publications according to the classes of the researchfacet and according to the research questions addressed by them(represented by Q1 to Q9). Table 4 details Fig. 7 showing which pa-pers answer each research question. It is valid to mention that, inboth categories, it was possible to have a study matching morethan one topic. Hence, the total amount veried in Figs. 6 and 7 ex-ceeds the nal set of primary studies selected for detailed analysis.

When merging these two categories, we have a quick overviewof the evidence gathered from the analysis of the SPL testing eld.We used a bubble plot to represent the interconnected frequencies,as shown in Fig. 8. This is basically a xy scatterplot with bubblesin category intersections. The size of a bubble is proportional to thenumber of articles that are in the pair of categories correspondingto the bubble coordinates [67].

The classication scheme applied in this paper enabled us to in-fer that researchers are mostly in the business of proposing newtechniques and investigating their properties more than evaluating

Table 4Research questions (RQ) and primary studies.

RQ Primary Studies

Q1 [3,8,9,20,29,30,35,38,39,4547,54,55,64,66,7275,83,84]Q2 [3,17,20,39,54]Q3 [3,20,24,29,36,34,30,46,47,49,50,54,55,57,64,61,69,73,75,83,84]Q4 [27,30,37,46,54,57]Q5 [22,23,54,55,60,72]Q6 [3,6,8,9,14,16,20,22,24,29,34,35,39,47,49,50,52,61,66,68,69,7275,83,84]Q7 [14,29,30,52,68]Q8 [3,8,16,20,22,24,28,29,3539,4547,49,50,53,54,6062,65,66,68,73

75,84]Q9 [3,27,30,36,62,66,75]

418 P.A. da Mota Silveira Neto et al. / Informatioand/or experiencing them in practice, through proposing new solu-tions, as seen in Fig. 8. Solution Proposal is the topic with moreentries, considering the research facets. Within this facet, moststudies address the questions Q1 (testing strategies), Q3 (testinglevels), Q6 (commonality and variability analysis) and Q8 (effortreduction). They have really been the overall focus of researchers.On the other hand we have pointed out topics in which new solu-tions are required, it is the case of Q2 (static and dynamic analysisinterconnection in SPL Testing), Q4 (regression testing), Q5 (non-functional testing), Q7 (variant binding time) and Q9 (measures).

Although some topics present a relevant amount of entries inthis analysis, such as Q1, Q3, Q6 and Q8, as aforementioned, thesestill lack eld research, since the techniques investigated and pro-posed are mostly novel and have usually not yet been imple-mented in practice. We realize that currently, Validation andEvaluation Research are weakly addressed in SPL Testing papers.Regarding the maturity of the eld in terms of validation and eval-uation research and solution papers, other studies report results inline with our ndings, e.g. [80]. Hence, we realize that this is not aproblem solely to SPL testing, but rather it involves, in a certainway, other software engineering practices.

We also realize that researchers are not concerned about Expe-rience Reports on their personal experience using particular ap-proaches. Practitioners in the eld should report results on theadoption, in the real world of the techniques proposed and reportedin the literature. Moreover, authors should Express Opinionsabout the desirable direction of SPL Testing research, expressingtheir experts viewpoint.

In fact, the volume of literature devoted to testing softwareproduct lines attests to the importance assigned to it by the prod-uct line community. In the following subsection we detail what weconsidered most relevant in our analysis.

6.3.1. Main ndings of the studyWe identied a number of test strategies that have been ap-

plied to software product lines. Many of these strategies addressdifferent aspects of the testing process and can be applied simulta-neously. However, we have no evidence about the effectiveness ofcombining strategies, and in which context it could be suitable. Theanalyzed studies do not cover this potential. There is only a briefindication that the decision about which kind of strategy to adoptdepends on a set of factors such as software development processmodel, languages used, company and team size, delivery time, andbudget. Moreover, it is a decision made in the planning stage of theproduct line organization since the strategy affects activities thatbegin during requirements denition. But it still remains ashypotheses, that need to be supported or refuted through formalexperiments and/or case studies.

A complete testing process should dene both static and dy-namic analyses. We found that even though some studies empha-size the importance of static analysis, few detail how this isperformed in a SPL context [39,54,79], despite its relevance in sin-gle-system development. Static analysis is particularly importantin a product line process since many of the most useful assetsare non-code assets and particularly the quality of the softwarearchitecture is critical to success.

Specic testing activities are divided across the two types ofactivities: domain engineering and application engineering.Alternatively, the testing activities can be grouped into core assetand product development. From the set of studies, around four[29,30,36,20] adopt (or advocate the use of) the V-model as an ap-proach to represent testing throughout the software developmentlife cycle. As a widely adopted strategy in single-system develop-ment, tailoring V-model to SPL could result in improved quality.However, there is no consensus on the correct set of testing levelsfor each SPL phase.

We did not nd evidence regarding the impact for the SPL of notperforming a specic testing level in domain or application engi-neering. For example, is there any consequence if, for exampleunit/integration/system testing was not performed in domain engi-neering?Weneed investigations to verify such an aspect. Moreover,what are the needed adaptations for the V-model to be effective inthe SPL context? This is a point which experimentation is welcome,in order to understand the behavior of testing levels in SPL.

A number of the studies addressed, or assumed, that testing activ-ities are automated (e.g. [16,49]). In a software product line automa-tion is more feasible because the resources required to automate areamortized over the larger number of products. The resources are alsomore narrowly focused due to the overlap of the products. Some of

d Software Technology 53 (2011) 407423the studies illustrated that the use of domain specic languages,and the tooling for those languages, is more feasible in a softwareproduct line context. Nevertheless, we need to understand if the

Unfamiliarity with other elds: The terms used in the search

n and Software Technology 53 (2011) 407423 419to test assets. We also did not nd information about the impactof different binding times for testing in SPL, e.g. compile-time,scoping-time, etc. We also lack evidences on this direction.

Regression testing does not belong to any one point in the soft-ware development life cycle and as a result there is a lack ofclarity in how regression testing should be handled. Despite this,it is clear that regression testing is important in the SPL context.Regression testing techniques include approaches to selecting thesmallest test suite that will still nd the most likely defects andtechniques that make automation of test execution efcient.

From the amount of studies analyzed, a few addressed testingnon-functional requirements [22,54,55,60,72]. Theypoint out thatduringarchitecturedesign static analysis canbeused togive anearlyindication of problems with non-functional requirements. Oneimportant point that should be considered when testing qualityattributes is the presence of trade-offs among them, for example,the trade-off between modularity and testability. This leads to nat-ural pairings of quality attributes and their associated tests. Whena variation point represents a variation in a quality attribute, the sta-tic analysis should be sufciently complete to investigate differentoutcomes. Investigations towardsmaking explicitwhich techniquescurrently applied for single-system development can be adopted inSPL are needed, since studies do not address such an issue.

Our mapping study has illustrated a number of areas in whichadditional investigation would be useful, specially regardingevaluation and validation research. In general, SPL testing lackevidence, in many aspects. Regression test selection techniques,test automation and architecture-based regression testing arepoints for future research as well as techniques that addressthe relationships between variability and testing and techniquesto handle traceability among test and development artifacts.

7. Threats to validity

There are some threats to the validity of our study. They are de-scribed and detailed as follows:

Research questions: The set of questionswedenedmight not havecovered the whole SPL testing area, which implies that one maynotndanswers to thequestions that concernthem.Asweconsid-ered this as a feasible threat, we had several discussion meetingswith projectmembers and experts in the area in order to calibratethe questions. Thisway, even ifwehad not selected themost opti-mum set of questions, we attempted to deeply address the mostasked and considered open issues in the eld.

Publication bias: We cannot guarantee that all relevant primarystudies were selected. It is possible that some relevant studiestechniques are indeed effective when applying them in an industrialcontext. We lack studies reporting results of this nature.

According to [45], one of the major problems in testing productlines is the large number of variations. The study reinforces theimportance of handling variability testing during all software lifecycle.

In particular, the effect of variant binding time concerns wasconsidered in this study. A well-dened approach was found in[29], with information provided by case studies conducted inan important electronic manufacturer. However, there are stillmany issues to be considered regarding variation and testing,such as what is the impact of designing variations in test assetsregarding effort reduction? What are the most suitable strategyto handle variability within test assets: use cases and test casesor maybe sequence or class diagrams? How to handle traceabilityand what is the impact of not handling such an issue, in respect

P.A. da Mota Silveira Neto et al. / Informatiowere not chosen throughout the searching process. We miti-gated this threat to the extent possible by following referencesin the primary studies.strings can have many synonyms, it is possible that we over-looked some work.

8. Concluding remarks and future work

The main motivation for this work was to investigate the state-of-the-art in SPL testing, through systematically mapping the liter-ature in order to determine what issues have been studied, as wellas by what means, and provide a guide to aid researchers in plan-ning future research. This research was conducted through a Map-ping Study, a useful technique for identifying the areas where thereis sufcient information for a SR to be effective, as well as thoseareas where more research is needed [12].

The amount of approaches that handle different and specic as-pects in the SPL testing process (i.e. how to deal with variant bind-ing time, regression testing and effort reduction), make the studiescomparison a hard task, since they do not deal with the same goalsor focus. Nevertheless, through this study we are able to identifywhich activities are handled by the existing approaches as wellas understanding how the researchers are developing work inSPL testing. Some research points were identied throughout thisresearch and these can be considered an important input into plan-ning further research.

Searching the literature, some important aspects are not re-ported, and when they are found just a brief overview is given.Regarding industrial experiences, we noticed they are rare in liter-ature. The existent case studies report small projects, containingresults obtained from in company-specic application, whichmakes impracticable their reproduction in other context, due tothe lack of details. This scenario depicts the need of experiment-ing SPL Testing approaches not in academia but rather in industry.This study identied the growing interest in a well-dened SPLTesting process, including tool support. Our ndings in this senseare in line with a previous study conducted by Lamancha et al.[48], which reports on a systematic review on SPL testing, as men-tioned in Section 2.

This mapping study also points out some topics that need addi-tional investigation, such as quality attribute testing consideringvariations in quality levels among products, how to maintain thetraceability between development and test artifacts, and the man-agement of variability through the whole development life cycle.Regarding to the research method used, this study also contributedimproving the mapping study process, by dening and proposingew steps as, protocol denition, collection form and qualitycriteria.

In our future agenda, we will combine the evidence identiedin this work with evidence from controlled experiments andindustrial SPL projects to dene hypotheses and theories whichwill be the base to design new methods, processes, and toolsfor SPL testing.

Acknowledgments

This work was partially supported by the National Institute ofScience and Technology for Software Engineering (INES4), fundedby CNPq and FACEPE, grants 573964/2008-4 and APQ-1037-1.03/08. Quality evaluation: The quality attributes as well as the weightused to quantify each of them might not properly representthe attributes importance. In order to mitigate this threat, thequality attributes were grouped in subsets to facilitate their fur-ther classication.4 http://www.ines.org.br

Appendix A. Quality studies scores

Id REF Study title Year A B C

1 Condron [16] A domain approach to test automation of product lines 2004 2 0 22 Feng et al. [22] A product line based aspect-oriented generative unit testing approach to

building quality components2007 1.5 0 2.5

3 Nebut et al. [60] A requirement-based approach to test product families 2003 2.5 1 1.54 Reis and Metzger [72] A reuse technique for performance testing of software product lines 2006 1.5 2 35 Kolb [45] A risk-driven approach for efciently testing software product lines 2003 2 1 2.56 Needham and Jones [62] A software fault tree metric 2006 0 0 17 Hartmann et al. [28] A UML-based approach for validating product lines 2004 1 2 0.58 Zeng et al. [84] Analysis of testing effort by using core assets in software product line

testing2004 1 1.5 2.5

9 Harrold [27] Architecture-based regression testing of evolving systems 1998 0 0.5 210 Li et al. [49] Automatic integration test generation from unit tests of eXVantage product

family2007 1 1 2

11 McGregor [55] Building reusable test assets for a product line 2002 2 2 0.512 Kolb and Muthig [46] Challenges in testing software product lines 2003 0 3 1.513 Cohen et al. [14] Coverage and adequacy in software product line testing 2006 1 1.5 214 Pohl and Sikora [69] Documenting variability in test artefacts 2005 1 0 115 Kishi and Noda [39] Formal verication and software product lines 2006 2 1.5 216 Kauppinen et al. [38] Hook and template coverage criteria for testing framework-based software

product families2004 0.5 0.5 3

17 Reis et al. [73] Integration testing in software product line engineering: a model-basedtechnique

2007 1 0 3

18 Kolb and Muthig [47] Making testing product lines more efcient by improving the testability ofproduct line architectures

2006 1 1.5 1.5

19 Reuys et al. [74] Model-based system testing of software product families 2005 2 1 3.520 Olimpiew and Gomaa [65] Model-based testing for applications derived from software product lines 2005 0 1 121 Jaring et al. [29] Modeling variability and testability interaction in software product line

engineering2008 2.5 6 3.5

22 Bertolino and Gnesi [8] PLUTO: a test methodology for product families 2003 0.5 1 323 Olimpiew and Gomaa [66] Reusable model-based testing 2009 3 0.5 3.524 Olimpiew and Gomaa [64] Reusable system tests for applications derived from software product lines 2005 2.5 1 125 Li et al. [50] Reuse execution traces to reduce testing of product lines 2007 0 0.5 226 Kauppinen and Taina [37] RITA environment for testing framework-based software product lines 2003 0 0 0.527 Pohl and Metzger [68] Software product line testing exploring principles and potential solutions 2006 0.5 0 2.528 McGregor [53] Structuring test assets in a product line effort 2001 1.5 1 0.529 Nebut et al. [61] System testing of product lines from requirements to test cases 2006 0 2 230 McGregor [54] Testing a software product line 2001 4 1.5 231 Denger and Kolb [17] Testing and inspecting reusable product line components: rst empirical

results2006 0 1 0.5

32 Kauppinen [36] Testing framework-based software product lines 2003 0.5 0.5 233 Edwin [20] Testing in software product line 2007 2 2.5 234 Al-Dallal and Sorenson [3] Testing software assets of framework-based product families during

application engineering stage2008 3 1 4

35 Kamsties et al. [34] Testing variabilities in use case models 2003 0.5 1.5 1.536 McGregor et al. [52] Testing variability in a software product line 2004 0 1 2.537 Reuys et al. [75] The ScenTED method for testing software product lines 2006 3 1 4.538 Jin-hua et al. [30] The W-Model for testing software product lines 2008 1 3 1.539 Kang et al. [35] Towards a formal framework for product line test development 2007 2 2 140 Lamancha and Macario Polo

Usaola [6]Towards an automated testing framework to manage variability using theUML testing prole

2009 0 0 1

41 Wbbeke [83] Towards an efcient reuse of test cases for software product lines 2008 0 0 242 Geppert et al. [24] Towards generating acceptance tests for product lines 2004 0.5 1.5 243 Muccini and van der Hoek

[57]Towards testing product line architectures 2003 0 2.5 1

44 Ganesan et al. [23] Towards testing response time of instances of a web-based product line 2005 1 1.5 145 Bertolino and Gnesi [9] Use case-based testing of product lines 2003 1 1 2.5

The shaded lines represent the most relevant studies according to the grades.

420 P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407423

A t

CSMR

SPLiT Software product line testing workshop

2009 Sixth International Conference on Information Technology: New

tion anTAIC PART Testing academic & Industrial conference

TEST International workshop on testing emergingSERVICESSPLCCongress on servicesSoftware product line conferenceInternational conference on softwareengineering and knowledge engineeringSEKE

and advanced applicationsSEAA

Annual ACM symposium on applied computingEuromicro conference on software engineeringSACInternational workshop on the role of softwarearchitecture in testing and analysisQSICROSATEAsoftware architecturesInternational conference on quality softwareQoSA

improvementInternational conference on the quality ofInternational conference on product focusedsoftware development and processPROFESInternational conference on model drivenengineering languages and systemsMODELS

reliability engineeringISSRE International symposium on software

and integrationIRI

International conference on web servicesInternational conference on information reuseICWSInternational conference on software testing,verication and validationICSRICSTmaintenanceInternational conference on software reuseICSM

engineeringInternational conference on softwareICSE

software systemsInternational conference on softwareICCBSS

programming and component engineeringInternational conference on composition-basedGPCE

engineeringInternational conference on generativeFASE Fundamental approaches to softwareWorking IEEE/IFIP conference on softwarearchitectureWICSA

measurementESEM

European software engineering conferenceEmpirical software engineering andECSAESECEuropean conference on web servicesEuropean conference on software architectureECOWSInternational conference and workshop on theengineering of computer based systemsECBSapplications conferenceEuropean conference on software maintenanceand reengineeringCOMPSAC

software engineeringInternational computer software andCBSE International symposium on component-basedInternational conference on advancedinformation systems engineeringCAiSE

software engineeringASE

Asia Pacic software engineering conferenceInternational conference on automatedAPSECInternational conference on aspect-orientedsoftware developmentAcronym

AOSDConference nameppendix B. Lis of conferencesP.A. da Mota Silveira Neto et al. / Informasoftware technologyMA, USA, 2004, pp. 2735.[17] C. Denger, R. Kolb, Testing and inspecting reusable product line components:pp. 184193Generations, Washington, DC, USA, 2009, pp. 10241029.[11] P. Brereton, B.A. Kitchenham, D. Budgen, M. Turner, M. Khalil, Lessons

from applying the systematic literature review process within thesoftware engineering domain, Journal of Systems and Software 80 (4)(2007) 571583.

[12] D. Budgen, M. Turner, P. Brereton, B. Kitchenham, Using Mapping Studies inSoftware Engineering, in: Proceedings of PPIG Psychology of ProgrammingInterest Group 2008, Lancaster University, UK, 2008, pp. 195204.

[13] L. Chen, M.A. Babar, N. Ali, Variability management in software product lines: asystematic review, in: SPLC09: Proceedings of 13th Software Product LineConference, San Francisco, CA, USA, 2009.

[14] M.B. Cohen, M.B. Dwyer, J. Shi, Coverage and adequacy in software product linetesting, in: ROSATEA 06: Proceedings of the ISSTA 2006 Workshop on Role ofSoftware Architecture for Testing and Analysis, ACM, New York, NY, USA, 2006,pp. 5363.

[15] N. Condori-Fernandez, M. Daneva, K. Sikkel, R. Wieringa, O. Dieste, O. Pastor, Asystematic mapping study on empirical evaluation of software requirementsspecications techniques, in: ESEM 09: Proceedings of the 2009 3rdInternational Symposium on Empirical Software Engineering andMeasurement, Washington, DC, USA, 2009, pp. 502505.

[16] C. Condron, A domain approach to test automation of product lines, in:SPLiT04: In International Workshop on Software Product Line Testing, Boston,

rst empirical results, in: ISESE06: Proceedings of the InternationalSymposium on Empirical Software Engineering, New York, NY, USA, 2006,Appendix C. List of journals

Journals

ACM Transactions on Software Engineering and Methodology(TOSEM)

Communications of the ACM (CACM)ELSEVIER Information and Software Technology (IST)ELSEVIER Journal of Systems and Software (JSS)IEEE SoftwareIEEE ComputerIEEE Transactions on Software EngineeringJournal of Software Maintenance Research and PracticeSoftware Practice and Experience JournalSoftware Quality JournalSoftware Testing, Verication and Reliability

References

[1] W. Afzal, R. Torkar, R. Feldt, A systematic mapping study on non-functionalsearch-based software testing, in: SEKE08: Proceedings of the 20thInternational Conference on Software Engineering and KnowledgeEngineering, Redwood City, California, USA, 2008, pp. 488493

[2] W. Afzal, R. Torkar, R. Feldt, A systematic review of search-based testing fornon-functional system properties, Information and Software Technology 51 (6)(2009) 957976.

[3] J. Al-Dallal, P.G. Sorenson, Testing software assets of framework-based productfamilies during application engineering stage, Journal of Software 3 (5) (2008)1125.

[4] P. Ammann, J. Offutt, Introduction to Software Testing, 1st ed., CambridgeUniversity Press, 2008.

[5] J. Bailey, D. Budgen, M. Turner, B. Kitchenham, P. Brereton, S. Linkman,Evidence relating to object-oriented software design: a s

1-s2.0-s0950584910002193-main.pdf

Documents

software development

software engineering

software product lines

software product lines5

spl context

important aspects

available evidence

existing spl approaches