lessons from three years of inspection datancoulter/cen6070/handouts/minorreport/weller.pdfmation...

8
Lessons from Three Years of Inspection Data EDWARD F. WELLER, Bull HN IT$OY-TTKI~~OTI Systems *iVletrics~om moye than 6,000 inspection meetings provide a solid empirical foundation for analyzing issues related to the inspectionprocess. I n Anril 1990, Bull HN Infor- mation Systemi Major Systems Division began an inspection program as part of the standard development process for its GCOS 8 operating system. Data collec- tion was crucial to the program’s early ac- ceptance, and the data collected over the last three years has been used to highlight potential problems and direct efforts to- ward continuous process improvement. The project team’s experience with the in- spection program has given the develop- ment staff much insight into both the benefits and drawbacks of the inspection process. The lessons we learned were based on the use of metrics collected from more than 6,000 inspection meetings. The GCOS 8 operating system is the descendent of the GECOS 3 operating system developed more than 2 5 years ago for General Electric Corp.‘s 6251635 mainframe systems. It currently runs on the DPS 9000 series of mainframes. 07407459/93/0900/0038/$03 00 0 IEEE “GCOS 8” collectively refers to all the system software, firmware, and test and diagnostic software-a total of more than 11 million lines of source code. About 400,000 to 600,000 lines of code are added annually, depending on the release cycle. Table 1 summarizes inspection data for the last two-plus years. The increasing number of defects between 199 1 and 1992 suggests that inspections became more ef- fective because the number of defects re- moved increased faster than the number of meetings or size of material inspected. (Here, defect means major defects that would result in a visible failure if not cor- rected.) One reason could be that the work-product type has not changed, which means the inspection process has had a chance to stabilize. In this article, I describe the methods used to collect and analyze the data and provide feedback to the development SEPTEMBER 1993

Upload: others

Post on 17-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lessons from Three Years of Inspection Datancoulter/cen6070/handouts/minorreport/Weller.pdfmation Systemi Major Systems Division began an inspection program as part of the standard

Lessons from Three Years of Inspection Data EDWARD F. WELLER, Bull HN IT$OY-TTKI~~OTI Systems

*iVletrics~om moye than 6,000

inspection meetings

provide a solid

empirical foundation

for analyzing issues related to the

inspection process.

I n Anril 1990, Bull HN Infor- mation Systemi Major Systems Division began an inspection program as part of the standard development process for its GCOS 8 operating system. Data collec- tion was crucial to the program’s early ac- ceptance, and the data collected over the last three years has been used to highlight potential problems and direct efforts to- ward continuous process improvement. The project team’s experience with the in- spection program has given the develop- ment staff much insight into both the benefits and drawbacks of the inspection process. The lessons we learned were based on the use of metrics collected from more than 6,000 inspection meetings.

The GCOS 8 operating system is the descendent of the GECOS 3 operating system developed more than 2 5 years ago for General Electric Corp.‘s 6251635 mainframe systems. It currently runs on the DPS 9000 series of mainframes.

07407459/93/0900/0038/$03 00 0 IEEE

“GCOS 8” collectively refers to all the system software, firmware, and test and diagnostic software-a total of more than 11 million lines of source code. About 400,000 to 600,000 lines of code are added annually, depending on the release cycle.

Table 1 summarizes inspection data for the last two-plus years. The increasing number of defects between 199 1 and 1992 suggests that inspections became more ef- fective because the number of defects re- moved increased faster than the number of meetings or size of material inspected. (Here, defect means major defects that would result in a visible failure if not cor- rected.) One reason could be that the work-product type has not changed, which means the inspection process has had a chance to stabilize.

In this article, I describe the methods used to collect and analyze the data and provide feedback to the development

SEPTEMBER 1993

Page 2: Lessons from Three Years of Inspection Datancoulter/cen6070/handouts/minorreport/Weller.pdfmation Systemi Major Systems Division began an inspection program as part of the standard

staff. I also examine the data from the point of view of the entire organization and as it applies in individual case studies of partic- ular project types. It is my hope that our experience and the lessons we have learned will help others better understand the usefulness of the inspection process as a tool for improving software-develop- ment quality and productivity.

ORGANIZATION AND DATA COLLECTION

Our company’s software-engineering process group maintains records of data collected by project-inspection coordina- tors, who also perform a data-sanity and -integrity check before passing it to the group. In a data-sanity check, the coordi- nators make sure attributes reported, such as number oflines of code, are indeed cor- rect. In a data-integrity check, they make sure that the correct number of defects have been reported, for example, or that defects have not been labeled minor when they are major.

Data is collected in several ways. Pro- jects may collect and analyze their data be- fore forwarding it to the process group. In other parts of the organization, the data is given to project-inspection coordinators or to the secretarial staff for data entry be- fore it is forwarded to the process group. This leads to a “hierarchical” collection of data, which can obscure the source of de- fects by the time the process group sees it.

When the process group does finally receive the data, they, along with inspec- tion coordinators and other software-pro- cess engineers, analyze the data using a variety of PC-based tools and Bull’s main- frame Interel relational database. Interel includes information about size (in thou- sands of lines of code) from our configura- tion-management system, as well as test- and field-defect data. Although this data is not yet fully integrated in the database, this combination lets us track product or mod- ule me&s beyond the inspection stage. This, in turn, lets us correlate data on de- fects at the shipping stage with inspection records for those defects.

Training. Training consists mostly of consultants providing inspection classes to

n d a a u s1 d d 1: u

fi

I

t

r

t

t

t

I t

lspection, time to con- uct the inspection, size of material inspected, and ata on any defects. We lso record a significant mount of data about the rork product being in- aected, which lets us con- uct a causal analysis of the efects. We can then corre- ite the inspection data iith projects, test data, and eld data. Some ofthe met-

its and identifiers are l Inspection type. What

ype of work product is it? Was the work jroduct tested before inspection? Is this a ix for a reported problem? Is this a first- ime inspection or a reinspection?

+ Process. Who moderated the inspec- ion? What were the details of the inspec- ion meeting (like time)? How large was he inspection team? What were the prep-

YOU CAN USE the-database. Having en-

INSPECTION gineers enter data also gave us a way to check it

DATA TO one last time before enuy. Early on, we recog-

BETTER nized that the amount of

UNDERSTAND data would become un- wieldy as the inspection

DEVELOPMENT process expanded through-

DYNAMICS. out the site. We developed a PC-based database pro-

Data category 199D* 1991 1992

Code-inspection meetings 1,500 2,431 2,823 Document-inspection meetings (anything other than code inspection) 54 257 348

Design-document pages inspected 1,194 5,419 6,870

Defects removed 2,205 3,703 5,649

Be program stmted m Mny 1990, JO figures wpment May to December 1990.

11 members of our technical staff, many irst- and second-level managers, and nembers of our product-line-manage- nent group. These classes include a stan- ard inspection-process class that covers he FaganMethod, a process developed by hchael Fagan to examine work products 3r defects.’ There is also a class tailored to he preparation and inspection ofrequire- Tents documents. The project inspection oordinators hold weekly meetings, at which they discuss inspection-process im- lrovements and inspection metrics. These meetings are also a form of training.

lnspedion metrics. We collect the usual lspection metrics: time to prepare for the

aration and inspection times? What was the size of the material inspected? Were there changes in the material inspected, or was new material added?

+ Cazlsal adysis. What is the module ID? What are details about the defect? How much time was spent in rework? Causal analysis also includes failure re- ports - like Internal Software Notes, which report test errors, and System Technical Action Requests, which report errors after the work product is shipped.

Collection method. We began collecting data using paper forms. For the first six months, process engineers responsible for the inspection program reviewed all in-

spection reports before they entered the data into

gram to retain the inspec- tion records, including a

tool that let us collect data by having + a recorder - who could be a team

member responsible for logging the de- fects - enter the data on a PC during the inspection meeting,

+ the recorder or project-inspection coordinator enter data from paper forms after the meeting, and

IEEE SOFTWARE 39

I --

Page 3: Lessons from Three Years of Inspection Datancoulter/cen6070/handouts/minorreport/Weller.pdfmation Systemi Major Systems Division began an inspection program as part of the standard

1 the size of inspected material (in our case, size of the module inspected) and defect

’ severitv. Errors in the first category are due to confusion over “size of material inspected” versus “new and changed matenal”(lines added or changed and then inspected) and clerical errors.

It is hard to discover these input errors once the data has been merged at the de- partnlent level because the hierarchy of reporting tends to blur the errors of indi- vidual projects. On more than one occa- sion, we have found that higher-than-de- sired inspection and preparation rates are the result of bad records. To avoid these

~ errors in the future, we plan to add data sanity checks to the collection tool for both size and rates. ,Meanwhile, we have added a reporting capability that provides preparation and inspection rates, defect rate, and inspection hours per defect to the collection tool so that recorders and coor- dinators can become more proficient at catching entry errors.

+ secretaries enter data from paper fonns.

JIost inspection teams prefer the last two methods, since the recorder is gener- ally too distracted with entering the results on the PC to effectiveh participate in the inspecl tion meeting. The last two methods are not without disadvantages, however. Errors are sometimes introduced, which necessitates a database search for inac- curate entries. The secre- taries are not sufficiently skilled to check data in- tegrity. But these disad- vantages are not enough to overcome team members natural resis- tance to paperwork - a syndrome typical of most metrics programs.

WHEN YOU COLLECT DATA, AVOID LOADED TERMS OR THOSE SIMILAR TO EXISTING ONES.

+ a general confusion about how diffi- cult the defect is to hx with how severe it is, and

Lesson 1 about data collection is YOU may hmc to sacrificr sov~e data ai~7n-ilq~ to make data collec?iou emier.

The two entries most often in error are

+ a misunderstanding about “severin” as the customer sees it or as it is defined in the contefl of testing.

To clank the confusion and misunder-

The cause of the errors in reporting defect severity was more complex. Most of the problem stemmed from the defect classifications “major” and “minor.” De- spite our clear differentiation of the two in inspection-training classes and documen- tation, those reporting the error often used other classification criteria; namely, how easy the defectwas to correct and how

severely it would affect the user.

We believe this ten- dencv to misclassify de- fects is due to

* a reluctance to clas- sie defects as major until the person responsible for the defect is certain that the defect data would not be visible to his manager for possible use in a performance ap- praisal,

standing surrounding the terms “major” and “minor,” we recommend that projects just beginning an inspection process change the terms to “visible to the user” and “not visible to the user.” This is less threatening, eliminates confusion about how difficult the defect is to fix, and is (at least this was true in our case) not confus- ing with prior terminology. We do got rec- ommend that projects switch terms in midstream, of course.

Lesson 2 about data collection is avoid mehc dejinitiom that are similar to preoiozlsly used metrics, or that are loaded temx

Data dassification. We introduced in- spections in the middle of a major system release, while coding and testing activities were underway. Initial inspection metrics indicated widely divergent defect-detec- tion rates. By classifying them according to the development stage in which inspec- tion took place, we found a much greater consistency in defect rate. \Ve used the following code classifications:

+ Code inspected before unit test (new code).

+ Reinspections. + Code inspected after unit test. * Fixes to problems reported during

test or by the customers. For our document inspections (which

we did in addition to code inspections), we found 16 document types - two types of requirements, high- and low-level design, three types of test plans, three types oftest cases, three tvpes of test specifications (unit, integration, and system), project plans, user manuals, and process docu- ments. The inspection metrics in these groups were sufficiently different to war- rant the classifications.

DATA ANALYSIS

The divergent metrics we found before we classified the data could be attributed to a number of factors that have to do with organization-wide trends and attitudes to- ward inspection. These include reluctance to inspect code before testing, inspection- team size and effectiveness, the variety of high-level design languages, and process stability.

40 SEPTEMBER 1993

Page 4: Lessons from Three Years of Inspection Datancoulter/cen6070/handouts/minorreport/Weller.pdfmation Systemi Major Systems Division began an inspection program as part of the standard

Inspection before test. As Figure 1 shows, l Code inspections may uncover an ~ be because one person has a vev high the defect-detection rates before and after underlying design defect not detected in preparation rate, not necessarily because unit test differ considerably. design inspections, pos- everyone has a slightly

inspected after unit test and had a lower defect-detection rate during that inspec- tion, while also having more defects after shipping.

Thus, there are four basic disadvan- tages of inspecting after unit test:

l Unit test lowers the motivation of the inspection team to find defects because it gives inspectors false confidence in the product. They “know the product works,” so why inspect?

+ I%‘hen large amounts of code are in- volved, optimizations can be very disrup- tive because all the code segments must be tested together in a unit test. \&hen you inspect before unit test, you can generally inspect smaller w-ork segments earlier in development. M ’hen inspections are done after unit test, the inspection teamor proj- ect manager may be tempted to say, “15Te can’t inspect all this code now. \h’e’re ready to enter integration test.” M ’e faced this problem shortly after we began the inspection program, when we ended up doing an inspection after unit test. About 15,000 lines of batched code were waiting to be inspected, and it was veT hard not to simply bypass the inspec- tion process.

+ 1Thenyou do unit test first, you miss the opportunity to save time and re- sources. Sometimes, if inspection results are good enough, you can bypass unit test altogether and do a more economical inte- gration test.

Code inspection after unit test contin- sibl$esulting in consid- ues to be more popular even though data erable work if the in- has proved that this method of inspection spections are post- UNIT TEST catches far fewer defects. Glen Russell has poned until after unit identified two reasons for not testing be- test. For example, in

TENDS TO fore inspections.’ The first has to do with some of our new-code LOWER motivation. The inspection process relies inspections, we discov- on a motivated team, and testing creates ered a serious design

MOTIVATION. the illusion that the product works. The error months before the WHY INSPECT

higher rate. The prepara- . tion-rate analysis sup ports our earlier supposi- tion that four team members have a greater overall product knowl- edge than three members.

second has to do with rework. M ’hen test- product uas due to enter ing is eAxtensive and the code producer has test, which let us recover A TESTED already logged many hours in creating the without significantly al- PRODUCT?

The lesson about team effectiveness is a72 impec- tion team? ejj&iveness md efkiemy depend on hut j;l- wziliar they nre with the produo and zhat thei?- in-

mtion rate is. Project manag- isider these factors as they de-

product, he may be unwilling to accept opti- tering the schedule. mizations that would require major rework. The lesson is i~yert bejm mit tt ‘St. .~pectiowpwpL

ll’e found Russell’s first reason to be ers must con true in several of our other projects, which i Inspection teams. \I’e investiffated the ef- 1 velop their resource and schedule plans.

feet of inspection-team size on defect-de- tection rates using data from more than 400 inspections that discovered defects. We found that four-person teams were twice as effective, and more than twice as efficient as three-person teams, which we attributed to two causes. First, the code producer is one third of the three-person team, but only one fourth of a four-person team. Second, a fourth inspector is a good indication that team members as a whole have a higher level of product knowledge. Inspection teams require at least three people, so the onlv valid reason for adding a fourth inspector is to increase the team’s level of expertise. Three-person inspec- tion teams may have members with a lower level of product knowledge because they may be serving as bodies to fill a quota.

Figure 2 shows two relationships ofde- feet-detection rates versus preparation rates (the rate at which code was reviewed during individual preparation). The data shows three- and four-person code in- spections, with each group further divided into preparation rates of greater than and less than 200 lines per hour. The fotu-per- son teams always outperform the three- person teams, and teams with lower prep- aration rates detect more defects.

Interestingly, a three-person inspec- tion team with a low preparation rate seems to do as well as a four-person in- spection team with a high rate. This may

Design-document data. \T’e have had more difficulty analyzing data from the inspection of docunents than from code inspections. There are so many design methodologies and types ofwork- structured analysis and design, various program-design lanffudges,

IEEE SOFTWARE 41

Page 5: Lessons from Three Years of Inspection Datancoulter/cen6070/handouts/minorreport/Weller.pdfmation Systemi Major Systems Division began an inspection program as part of the standard

lhxument size (pages) Document size (PogesJ

and structured or unsmlctured English - that we have insufficient data on each method to do a complete analysis, although we are recording the language used for these documents.

Figure 3 shows data from high-level design inspections. Defects per document page have changed over time, inspection size appears to be under better control (the “knee” in the scatter points has moved from 20 to 15 pages, and the defect-detec- tion rate is higher), and inspection yield (number of defects detected) has im- proved. There are two possible explana- tions for the higher defect-detection rate. The first is that we were able to clarify the definition of defect severiq during our start-up phase (“major defect” not mean- ing a defect that is complex or hard to fix, but rather one that is visible to the user). The second is that we were able to better control the amount of material inspected in one meeting.

The lesson (one that revisits lesson 2 under data collection) is enswe that the def initiom of the mrt~-icc sly clearly undemood, and that guideha j&l- size of material iTz- spe(?ed m-c folhmed.

Process stability. We have used the over- all defect-density rates to measure the SW bility of the inspection process. Figure 4 plots the relative code and document-in- spection yields for three years.

The dip in the first half of 1991 is due

to a transition factor that occurred at that time. The initial data was provided pri- marily by a small number of pilot projects that received special attention from the process group and senior management. This group had lots of support for any changes and a suitable work product. As the process was applied to the entire site in late 1990 and 1991, the process group sup- port that was focused on the pilot projects was diluted across more projects.

To combat this dilution, I and later my colleague Robin Fulford provided the project-inspection coordinators with a data analysis ofinspection and preparation rate, size of material, and other process metrics. Because we used actual data to emphasize the relationship of process metrics to detection rates, inspection teams were able to better understand why it was important to adhere to the recom- mended process metrics. As Figure 4 show-s, the trend reversed and stabilized in the second half of 1991,

The lesson is e.qeti a drop in effectivnzess asyon tmnsition jknn pilot to gmeml me, and be p-epm-ed to addms the poblems with pocess metrio.

CASE STUDIES

An organization-wide analysis of in- spection data can help you assess the health of the inspection process, but the best use of inspection data is at the project

level. Because the data is more likely to be from work on a common design, and from a more consistent set of inspectors, data analysis provides more useful information and guidance to the project members.

The case studies that follow demon- strate both the pluses and minuses of the inspection process. In some cases, the lit- eral interpretation of the data will be in- correct. All four case studies presented are drawn from our code-inspection data.

Project A. This project illustrates how the inspection team can use metrics to evaluate the process. One of our pilot pro- jects consisted of having inspectors collect and use additional metrics to measure their inspection effectiveness. By develop- ing control limits that represented the norm for their inspections, the inspectors could measure each inspection against the expected preparation and inspection rates, yield in defects per thousand lines of code, and cost per major defect. They also kept reasonably accurate defect rates from unit tests.

The benefit of collecting this data, and understanding its value, was demonstrated during unit test. The code generated by the C compiler for one of the control mi- croprocessors was inefficient and had to be replaced with Forth for several timing- critical routines. The team initially de- cided not to inspect the rewritten code (just do a simple conversion of 1,200 lines

42 SEPTEMBER 1993

Page 6: Lessons from Three Years of Inspection Datancoulter/cen6070/handouts/minorreport/Weller.pdfmation Systemi Major Systems Division began an inspection program as part of the standard

of C!), but testing the rewritt taking about six hours per failure. Because they knew they had been finding defects in inspections at a cost of 1.43 hours per de- fect, the team stopped testing and in- spected the rewritten code. These inspec- tions initially found defects at a cost of less than one hour per defect.

The defect density dropped signifi- cantly after the third inspection meeting, indicating the causal analysis of the defect source had eliminated the defects. With- out the data that showed the relative costs, it would have been harder to back off the unit test. The team would have been fight- ing management pressure to continue de- bugging, as well as the natural resistance to inspection in favor of debugging.

Data at the end of unit test indicated that the code inspection before test was 80 percent effective (defects found by inspec- tion/total number of defects). Effective- ness after system test was around 70 per- cent (inspections found 70 percent of all the defects detected after code was com- pleted).

motivated to perform inspections, and the continually unfolding requirements. ~ project leader kept good records of each A comparison of defect data gathered

The lesson is get the data into the zm+ hands and they will use it.

Project B. This project, another one of the first to use code inspections, illustrates that good inspection results do not auto- matically mean the project will succeed. In this prolect, coding started just as the first

&spection. Indeed,% an analysis of 56 code inspections, both the preparation and inspection rates were within one stan- dard deviation, indicating that the inspec- tion process was under control.

M?th this kind of inspection data, you would thinkthat the project would be suc- cessful. In fact, the team was given special mention during project reviews for their

during inspections and unit test-showed that inspection defects were mostly cod- ing or low-level design errors; defects found in test, on the other hand, were re- quirements and architectural-design de- fects, which had been primarily responsi- ble for code growth.

Although code growth had not really been a surprise to the development team

L_

inspection workshop was use of the inspection pro- or management, the underlying reasons being offered. Conse- quently, the require- NO AMOUNT

cess. However, during the for that growth were not well understood. code inspections, the proj- The detailed data highlighted specific de-

ments and design docu- i

ments had not been OF INSPECTION ect leader observed that sign problems and let the development i team base their conclusions on numbers

inspected. The project was also a prototype for a

CAN MAKE UP the code inspections were assuming a correct design, and facts rather than conjecture and in-

technology we had not FOR DESIGN an assumption that may stinct. not have been warranted, Management decided to place the

previously used, which product on hold because the data showed greatly complicated the

FLAWS. YOU since the team had not in-

MUST INSPECT spected the design docu- that it clearly lacked the desired quali?. problem. The team ments. The data was also used to justify process would have been able to ALL BASIC The project leader’s changes for future projects involving urn- find the design errors misgivings were well- familiar technoloLgy. much more easily had DESIGN founded. The project was The lesson is good impedou I-emIts mu they been familiar with DOCUMENTS. not successful, and in a create fake ronjdeure. In.~pe&ons m-e not a the technology. postmortem review, the silzw k&t. Be swe to iqmt nil lmsir des@

We inspected more team used the inspection dot7lnrt~7~ts. than 12,000 lines of C, with an average 1 data to pinpoint one of the key problems: defect density of 2 3 defects per thousand Code growth during unit and integration Pro’pt C. This project illustrates ho\\ ~ lines of code. The project was strongly test was almost 100 percent because of the inspection can improve the quality of

Figure 5. 12([onth.$ defect r’ates in fi\-es to shipped pl-odurtsfiom 1989 to 1992. When impectiom forfi.yes became mandatmy, the percentage ofdefehrefires dropped to about a third of zht it ras. h the ladyem the improvement has dropped to about half: possib[~~ because ofthe Hazthome effect -something improves because zt is being absented. The mtliel- in September 199 I z?as created b.,g the small mmbel- off;e.r during that month.

IEEE SOFTWARE 43

Page 7: Lessons from Three Years of Inspection Datancoulter/cen6070/handouts/minorreport/Weller.pdfmation Systemi Major Systems Division began an inspection program as part of the standard

Type of code Inspection rote Defect-detection rote (lines per hour) (per thousand lines of code)

maintenance fixes. Repair of field defects is one of the most error-prone activities in software development; repair-defect rates can be as high as 50 percent for one-line changes, 75 percent for changes involving five lines, decreasing to 35 percent for changes involving 20 lines.3

Before we instituted inspections, we had collected the defective 6r rate - the number of fixes replaced because of fail- ures in the original lix - for almost one year in an attempt to improve the reliabil- ity of our &es.

In August 1990, inspections for all fixes became mandatory. Figure 5, which de- picts monthly defect rates in fixes to shipped products, shows the percentage of defective lixes dropped to about a third of what it was about this time. In the last year, the improvement has dropped to about half. (ln September 199 1, there were few fixes, so the small number of defects for that month created an outlier. Also, the Haw- thorne effect - some- thing being monitored improves simply because it is being observed - might have contributed to lower curves in the first half of 1992 .)

IN ONE CASE, were to inspect a new data-access service mod-

INSPECTION ule and I12 modified modules with which it in-

RESULTS WERE terfaced. In the 18,000

GOOD ENOUGH lines of code we in-

TO SKIP UNIT spected, we found 265 major defects. Table 2

TEST AND DO shows the inspection and defect rates for this code.

A BIG-BANG MODULE INTEGRATION. When we correlated

the reduced number of replaced fixes with the in-

found by inspections. We speculate that the inspection process could have had the initial side benefit of improving product quality independently of any defects the inspection uncovered. The initial im- provement might also have been due to the Hawthorne effect. To determine its cause, we are currently examining the data in more detail to isolate possibilities, such as the size of the changes and any shift in the source of the defects. Without the causal-analysis metrics described earlier, this detective work would be impossible.

The lesson is impectimzs can improve the qua& of nuintename~es.

Prw D. This project illustrates that in- spections can be effective even when the data seems to indicate otherwise.

In this project, we

Looking at the data in isolation, and observing the variance in inspection rates, it might seem natu- ral to question the effec-

spection data, we discovered an interest- ing anomaly. Only 13 percent of the in- spections had detected defects, much less than the observed improvement for the first year, as measured by customer use. In other words, we could attribute only half the initial improvement to the defects

tiveness of the new-ser- vice-module inspection (because the in- spection rate was high and few defects were found). In fact, although we expected a higher defect rate in test, both inspec- tions were equally effective.

A closer investigation of the new-ser- vice-module inspection data explained the

high rate and low yield. The design used a large table of data that was well structured, which let us rapidly inspect a large seg- ment of code and consequently have a lower defect density. The different rates indicated that the restructured design and code had fewer defects than the modified code.

Lesson 1 in this project is investigate the work product bejhre deciding that process met- rirs indicate an ineffective impectirm process.

Inspection data from this project pro- vided another benefit. After the develop- ment team did a unit test on the first six modified modules without discovering any defects, they skipped the unit test of the remaining modules because they be- lieved the inspection results warranted it. Consequently, they saved significant re- sources and time because unit test of this part of the operating system is lengthy and the test cases are rerun in integration test. Their decision was a trade-off of time saved in one activity against the potentially higher costs of defects in the next activity. The trade-off would be a good one only if the second activity had a small number of defects. The decision turned out to be wise; only four defects have been discov- ered in a year of unit test, integration test, and continued use of the product by other projects, against the 298 defects found by inspections.

Although this 98.7-percent inspection effectiveness (2981302) will undoubtedly drop during system test, the results show the power of inspections to influence the course of development.

Frank Ackerman and colleagues have suggested that inspections can replace unit testing, but not later stages of testing, and that benefits in later stages are mainly pro- ductivity gains! This project supported these suggestions. We were able to execute the unit-test suite in integration test to en- sure that inspections did not miss catego- ries of defects that are difficult to detect by inspection. We also effectively bypassed unit test with this product, and were able to do a “big bang” module integration - test many changes at once - rather than the usual continuous integration - accu- mulate a few changes at a time. Our test cost was essentially the fured cost of the test, or the cost of running the test pro-

44 SEPTEMBER 1993

Page 8: Lessons from Three Years of Inspection Datancoulter/cen6070/handouts/minorreport/Weller.pdfmation Systemi Major Systems Division began an inspection program as part of the standard

grams one time. Lesson 2 from this project (one that

underlines some points I made earlier) is i?wpections can replace unit test with signifcant cost savings.

Analyslr We can learn several things about the inspection process l?om these case studies.

l Inrpectiom can instrument the develop- mentprocess. Inspection data can be used by the development staff in real time to make decisions about which process is best to follow. In project A, the users had access to the metrics, and in project D, the develop- ment staff changed the test process.

You can also use inspection data to bet- ter understand the dynamics of the devel- opment process and provide hard data for decision making. In project B, the data was used to determine that the project was not successful. In project C, data was used to determine causes for a recent increase in the defect rate.

Inspection data collected from earlier development activities, from require- ments through low-level design, has pro- vided measurable indicators of progress during these activities. This hard evidence of progress has helped to forestall the inev- itable query, “Why isn’t Sam coding yet?” because managers have a way to measure the completion and quality of the require- ments and design work products.

+ No matteY bow well t&y are executed, impectiom cannot overcome seriousjhws in the developmentprocess. In project B, the under- lying faults were masked by the apparent progress reported by the inspection re- sults. My advice to projects starting code inspections is to carefully evaluate the re- view process used to verify the require- ments analysis and architectural design because inspection is more likely to detect code and low-level design defects but much less likely to detect requirements and high-level design defects.

The data gathered l?om design inspec- tions shows trends similar to those for code inspections: In both types of inspec- tions, high preparation rates and larger sizes tend to impede the ability to detect defects. Even knowing this, we are finding it difficult to lower the preparation rates.

Overnight changes just do not happen. We are continuing to work on this prob- lem by

+ Providing data analysis to the devel- opment teams during inspection. We have had several successes when we fed the data back to the providers.

+ Evaluating the level of detail or completeness of the design documents. In several cases, the high preparation rates and low detection rates are directly attributable to sparse design documen- tation.

I nspections can greatly improve soft- ware-development quality and produc-

tivity when users have first-hand access to the inspection data, understand the met- rics and their effect on the inspection pro- cess, and use sound software-engineering methods and processes. When require-

ACKNOWLEDGEMENTS Without the data provided by the software-development staff in Bull’s Major Systems Division and the ef-

forts of Robin Fulford to collect and maintain the inspection database, this article would not have been possible. I also thank the IEEE S&we referees for their helpful comments.

REFERENCES 1. M. Fagan, “Design and Code Inspections to Reduce Errors in Program Development,” IBM SyaemJ.,

No. 3,1976, pp. 219.248. 2. G. Russell, “Experiences with Inspections in Ultralarge-Scale Developments,” IEEE Sojwure, Jan. 1991,

pp. 25-31 3. G. Weinberg, “Kill That Code!” IEEE Tutorial on SofN)are Remctuting, IEEE CS Press, Los Alamitos,

Calif., 1986,~~. 130-131. 4. E Ackerman, L. Buchwald, and E Lwski, “Software Inspections: An Effective Verification Process,” IEEE

S+are, May 1989, pp. 3 1-36. 5. J, Martin and W. ‘I%, “N-Fold Inspections: A Requirement Analysis Technique,” Comm. ACM, Feb. 1990,

pp. 225-232.

Edward E Weller is a principal fellow at Bull HN Information Systems, Enterprise Sys- tem Operations. His responsibilities include monitoring the software-development pro- cesses used by the Large-System Software Products group, which developed GCOS 8, ar activities to improve process ratings as measured by the SE1 Capability Maturity Model. His prior interests are hardware design, diagnostic and design-verification software, and system-maintainability design.

Weller received a BSEE from the University of Michigan and an MSEE from the Fk ida Institute ofTechnology. He is one ofthe cofounders and current cochair of the Soft- ware Inspection and Review Organization, a senior member of the IEEE, and a member ACM and Sigma Xi.

Address questions about this article to Weller at Bull HN Information Systems, 13430 N. Black Canyon Hwy., Phoenix, AZ 85029; Internet [email protected].

ments are poorly understood, or subject tc significant changes, the inspection proces! has little effect on the project’s eventua success or failure, although inspectior data is valuable in postmortems.

The most pressing need in the future i: to better understand data from require. ments and design inspections. A key ele. ment required to compare defect densitie: and other rates is the ability to product documents with consistent levels of detai across projects. We have just begun exper imenting with n-fold inspections’ of re. quirements specifications -in which twc or more teams inspect the same documen - as a way to improve detection.

The inspection process is not perfec but it can prove a valuable tool for soft ware developers and management ir predicting a product’s quality and reli ability. I

IEEE SOFTWARE 45