lopez

REASONING ABOUT FLAWS IN SOFTWARE DESIGN:DIAGNOSIS AND RECOVERY

TAMARA LOPEZ

[email protected]

Supervisors Marian Petre, Charles Haley and Bashar NuseibehDepartment Computing

Status Full-timeProbation Viva Before

Starting Date February 2010

Since its diagnosis at the 1960’s NATO conferences as one of the key problems in

computing[1, 2], the provision of reliable software has been a core theme in software

engineering research. One strand of this research analyzes software that fails, while

a second develops and tests techniques for ensuring software success.

Despite these efforts, the threat of failure and the quest for a multivalent yet com-

prehensive ”sense” of quality[2] remain powerful drivers for research and provoca-

tive tropes in anecdotal accounts of computing[3]. However, current analytical

approaches tend to result in overly broad accounts of why software fails or in

overly narrow views about what is required to make software succeed. This sug-

gests a need for a different approach toward the study of failure that can address

the complexities of large scale ”systems-of-systems”[4, 5], while accounting for the

effects and trajectories of specific choices made within software initiatives.

To address this gap, this research asks: How does failure manifest in actual soft-

ware development practice? What constitutes a flaw, and what are the conditions

surrounding its occurrence and correction? What can adopting a situational ori-

entation tell us more generally about why some software fails and other software

succeeds?

Background

Within computing literature, failure analysis typically takes two perspectives:

2010 CRC PhD Student Conference

Page 47 of 125

• Systemic analyses identify weak elements in complex organizational,

operational and software systems. Within these systems, individual or

multiple faults become active at a moment in time or within a clearly

bounded interval of time, and result in catastrophic or spectacular op-

erational failure[6, 7]. Alternatively, software deemed ”good enough” is

released into production with significant problems that require costly main-

tenance, redesign and redevelopment[8, 5].

• Means analyses treat smaller aspects or attributes of software engi-

neering as they contribute to the goal of creating dependable software[4].

These studies develop new or test existing techniques to strengthen all

stages of development such as requirements engineering[9], architectural

structuring[10], testing and maintenance [11] and verification and validation[12].

Systemic analyses produce case studies and often do not conclude with specific,

precise reasons for failure. Instead they retrospectively identify the system or sub-

system that failed, and provide general recommendations for improvement going

forward. Even when they do isolate weaknesses in the processes of software creation

or in particular software components, they do not produce general frameworks or

models that can be extended to improve software engineering practice.

Means analyses employ a range of methods including statistical, program analy-

sis, case study development, formal mathematical modeling and systems analysis.

Frequently, they examine a single part of the development process, with a corre-

sponding focus on achieving a single dependability mean[4]. The studies are often

experimental, applying a set of controlled techniques to existing bodies of soft-

ware in an effort to prove, verify and validate that software meets a quantifiable,

pre-determined degree of ”correctness”.

Methodology

This research will produce an analysis of the phenomenon of failure that lies some-

where between the broad, behavioral parameters of systemic analyses and the

narrowly focused goals of means analyses. To do this, it will draw upon recent

software engineering research that combines the socially oriented qualitative ap-

proaches of computer supported cooperative work(CSCW) with existing software


Page 48 of 125

analysis techniques to provide new understandings of longstanding problems in

software engineering. In one such group of studies, de Souza and collaborators

have expanded the notion of dependency beyond its technical emphasis on the

ways in which software components rely on one another, demonstrating that hu-

man and organizational factors are also coupled to and expressed within software

source code[14, 15]. In a study published in 2009, Aranda and Venolia made a case

for developing rich bug histories using qualitative analyses in order to reveal the

complex interdependencies of social, organizational and technical knowledge that

influence and inform software maintenance[16].

In the manner of this and other cooperative and human aspects of software en-

gineering(CHASE) work, the research described here will apply a combination of

analytic and qualitative methods to examine the role of failure in the software

development process as it unfolds. Studies will be designed to allow for analysis

and examination of flaws within a heterogeneous artifact universe, with particu-

lar emphasis given to the interconnections between technical workers and artifacts.

Ethnographically informed techniques will be used to deepen understanding about

how the selected environments operate, and about how notions of failure and re-

covery operate within the development processes under investigation.

References

[1] P. Naur and B. Randell, “Software engineering: Report on a conference sponsored by the

NATO Science Committee Garmisch, Germany, 7th to 11th October 1968,” NATO Science

Committee, Scientific Affairs Division NATO Brussels 39 Belgium, Tech. Rep., January

1969. [Online]. Available: http://homepages.cs.ncl.ac.uk/brian.randell/NATO/

[2] J. Buxton and B. Randell, “Software engineering techniques: Report on a conference

sponsored by the NATO Science Committee Rome, Italy, 27th to 31st October 1969,” NATO

Science Committee, Scientific Affairs Division NATO Brussels 39 Belgium, Tech. Rep.,

April 1970 1970. [Online]. Available: http://homepages.cs.ncl.ac.uk/brian.randell/NATO/

[3] R. Charette, “Why software fails,” IEEE Spectrum, vol. 42, no. 9, pp. 42–49, 2005.

[4] B. Randell, “Dependability-A unifying concept,” in Proceedings of the Conference on Com-

puter Security, Dependability, and Assurance: From Needs to Solutions. IEEE Computer

Society Washington, DC, USA, 1998.

[5] ——, “A computer scientist’s reactions to NPfIT,” Journal of Information Technology,

vol. 22, no. 3, pp. 222–234, 2007.


Page 49 of 125

[6] N. G. Leveson and C. S. Turner, “Investigation of the Therac-25 accidents,” IEEE Computer,

vol. 26, no. 7, pp. 18–41, 1993.

[7] B. Nuseibeh, “Ariane 5: Who dunnit?” IEEE Software, vol. 14, pp. 15–16, 1997.

[8] D. Ince, “Victoria Climbie, Baby P and the technological shackling of British childrens social

work,” Open University, Tech. Rep. 2010/01, 2010.

[9] T. Thein Than, M. Jackson, R. Laney, B. Nuseibeh, and Y. Yu, “Are your lights off? Using

problem frames to diagnose system failures,” Requirements Engineering, IEEE International

Conference on, vol. 0, pp. v–ix, 2009.

[10] H. Sozer, B. Tekinerdogan, and M. Aksit, “FLORA: A framework for decomposing software

architecture to introduce local recovery,” Software: Practice and Experience, vol. 39, no. 10,

pp. 869–889, 2009. [Online]. Available: http://dx.doi.org/10.1002/spe.916

[11] F.-Z. Zou, “A change-point perspective on the software failure process,” Software

Testing, Verification and Reliability, vol. 13, no. 2, pp. 85–93, 2003. [Online]. Available:

http://dx.doi.org/10.1002/stvr.268

[12] A. Bertolino and L. Strigini, “Assessing the risk due to software faults: Estimates of

failure rate versus evidence of perfection,” Software Testing, Verification and Reliability,

vol. 8, no. 3, pp. 155–166, 1998. [Online]. Available: http://dx.doi.org/10.1002/(SICI)1099-

1689(1998090)8:3¡155::AID-STVR163¿3.0.CO;2-B

[13] Y. Dittrich, D. W. Randall, and J. Singer, “Software engineering as cooperative work,”

Computer Supported Cooperative Work, vol. 18, no. 5-6, pp. 393–399, 2009.

[14] C. R. B. de Souza, D. Redmiles, L.-T. Cheng, D. Millen, and J. Patterson, “Sometimes

you need to see through walls: A field study of application programming interfaces,” in

CSCW ’04: Proceedings of the 2004 ACM conference on Computer supported cooperative

work. New York, NY, USA: ACM, 2004, pp. 63–71.

[15] C. de Souza, J. Froehlich, and P. Dourish, “Seeking the source: Software source code as a

social and technical artifact,” in GROUP ’05: Proceedings of the 2005 international ACM

SIGGROUP conference on Supporting group work. New York, NY, USA: ACM, 2005, pp.

197–206.

[16] J. Aranda and G. Venolia, “The secret life of bugs: Going past the errors and omissions in

software repositories,” in Proceedings of the 2009 IEEE 31st International Conference on

Software Engineering. IEEE Computer Society, 2009, pp. 298–308.


Page 50 of 125