slide 1 ufrj coppe department of computer science experimental software engineering group fraunhofer...

UFRJ

COPPE

Department of Computer ScienceExperimental Software Engineering Group

Fraunhofer Center - Maryland

Some experiences at NTNU and Ericsson with Object-Oriented Reading Techniques

(OORTs) for Design Documents

INTER-PROFIT/CeBASE seminar on Empirical Software Engineering,

Simula Research Lab, Oslo, 22-23 Aug. 2002

Reidar Conradi, NTNUForrest Shull, Victor R. Basili, FC-Maryland (FC-MD)

Guilherme H. Travassos, Jeff Carver, Univ.Maryland (UMD)

{travassos, basili, carver}@cs.umd.edu; [email protected] http://www.cs.umd.edu/projects/SoftEng/ESEG/

[email protected], http://www.idi.ntnu.no/grupper/su/

-

Table of contents

• Motivation and context p. 3

• Reading and Inspections p. 4

• OO Reading and UML Documents, OORTs p. 8

• OO Reading and Defect Types p. 13

• OO Reading-Related Concepts p. 18

• Ex. Gas Station Control System p. 20

• OORT experimentation p. 27

• E0: Post-mortem of SDL-inspections at Ericsson-Oslo, 1997-99 p. 30

• E1: Student OORT experiment at NTNU, Spring 2000 p. 32

• E2: Feasibility OORT study at NTNU, Autumn 2001 p. 40

• E3: Student OORT experiment at NTNU, Spring 2002 p. 42

• E4: Industrial OORT experiment at Ericsson-Grimstad, May 2002 p. 47

• Conclusion p. 50

• Appendix: OORT-1: Sequence Diagram x Class Diagram p. 51

Motivation and context

• Need to have better techniques for OO reading – UML/Java. But OO software is not a set of simple, ”linear” documents.

• Norwegian OORT work started at Conradi’s sabbatical at Univ. Maryland in 1999/2000: Adapting OORT experimental material from CS735 course at UMD, Fall 1999. All artifacts and instructions in English.

• E0. Data mining of SDL inspections at Ericsson-Oslo, 1997-99. Internal database, but marginal data analysis. As 3 MSc theses.

• E1. 1st OORT exper. at NTNU, 4th year QA course, March 2000, 19 stud. Leaner reading instructions as Qx.ij, no other changes. Pass/no-pass for stud.

• E2. OORT feasibility study at NTNU: Two MSc NTNU-students, Lars Christian Hegde and Tayyaba Arif, repeating E1 in Autumn 2001.

• E3. 2nd OORT exper. at NTNU, 4th year sw.arch. course, March 2002, 42 stud. Adjusted reading instructions (for E3), removed trivial defects in UML artifacts.

• E4. Industrial OORT exper. at Ericsson-Grimstad, 10 developers, 50/50 on old/new reading techniques, adjusted E3 techniques. Meagre internal baseline data. Part of PROFIT project - developers were paid 500 NOK/h, 4 MSc students (NTNU, HiA) and one PhD student at NTNU/Ericsson (Mohagheghi).

Reading and Inspections

• Reading (reviewing): Systematic reading of most software documents / artifacts (requirements, design, code, test data etc.) can increase:– Reliability: since other persons are much better in finding defects

(“errors”) created by you. Often psychological block on this.– Productivity: since defect fixing is much cheaper in earlier life-cycle

phases, e.g. 30$ to correct a line in rqmts, $4000 to fix a code line. And 2/3 of all defects can be found before testing, at 1/3 of the price.

– Understanding: e.g. for maintenance or error fixing.– Knowledge transfer: novices should read the code of experts, and

inversely.– Maintainability: by suggesting a “better” solution/architecture, e.g. to

increase reuse.– General design quality/architecture: combining some of the above.

• We should not only write software (write-once & never-read?), but also read it (own and other’s). But need guidelines (e.g. OORTs) to learn to read efficiently, not ad-hoc.

Why read software?

Reading and Inspections (2)

• Logistical problems: takes effort, how to schedule and attend meetings?

• Proper skills are unavailable: needs domain and/or technological insight.

• Sensitive criticism: may offend the author, so counter-productive?

• Rewards or incentives: may be needed for both readers and writers.

• Boring: so need incentives, as above?

• General lack of motivation or insights.

Why NOT read software?


• “Fagan” inspections for defect discovery of any software artifact:

– I1. Preparation: what is to be done, plan the work etc. (*).

– I2. Individual reading: w/ complementary perspectives to maximize effect (*).

– I3. Common inspection meeting: assessing the reported defects.

– I4. Follow-up: correct and re-check quality, revise guidelines?

• Will look at steps I1-I2 here (*), with OORT1-7 guidelines for perspectives.

• In industry: typically 10% of effort on inspections, with net saving of 10-25%. So quality is “free”!

• Recently: much emphasis on OO software development, e.g. using Rational Rose tool to create UML design diagrams. But few, tailored reading techniques for such documents. Over 200,000 UML licenses, so big potential.

• Example: Ericsson in Norway: previously using SDL, now UML and Java. Had an old inspection process, but need new reading techniques.

Classic inspections


Reading

Analysis

DefectDetection

Usability

Design Requirements Code UserInterface

SCR English Screen Shot

Defect-based Perspective-based Usability-based

Inconsistent

IncorrectOmission

AmbiguityTester UserDeveloper

Novice ErrorExpert

TechnologyTechnology

FamilyFamily

General GoalGeneral Goal

Specific GoalSpecific Goal

DocumentDocument(artifact)(artifact)

NotationNotationFormForm

TechniqueTechnique

PROBLEM SPACE (needs)

SOLUTION SPACE (techniques, perspectives)

UML Diagrams

Traceability

Horizontal Vertical

Different needs and techniques of reading, e.g.:

Construction

OO Reading and UML Documents

• Dynamic ViewDynamic View

– Use cases (analysis) *

– Activities

– Interaction sequences * collaborations

– State machines *

• Static ViewStatic View

– Classes * Relationships

Generalization - IsA Composition - PartsOf Association - HasA Dependency - DependsOn Realization

Extensibility Constraints Stereotypes

– Descriptions (generated from UML) *

– Packages

– Deployment

UML Artifacts/Diagrams, five used later (marked with UML Artifacts/Diagrams, five used later (marked with *):*):

• Unified Modeling Language, UML: just a notational approach, does not propose/define how to organize the design tasks (process).

• Can be tailored to fit different development situations and software life-cycles (processes)

OO Reading and UML Documents (2)

• Requirements descriptions, RD: here structured text with numbered items, e.g. for a Gas Station. In other contexts: possibly with extra ER- and flow-diagrams.

• (Requirement) Analysis documents, UC: here use case diagrams in UML, with associated pseudo-code or comments (“textual” use case). A use case describes important concepts of the system and the functionalities it provides.

• Design documents, also in UML:

– Class diagrams, CD: describe the classes and their attributes, behaviors (functions = message definitions) and relationships.

– Sequence diagrams (special interaction diagrams), SqD: describe how the system objects are exchanging messages.

– State diagrams, StD: describe the states of the main system objects, and how state transitions can take place.

• Class descriptions, CDe: separate textual documentation of the classes, partly as UML-generated interfaces in some programming language.

Our six relevant software artifacts:


Fixed_Rate Loan

risk()principal_remaining()

Variable_Rate Loanprincipal_remaining : number

risk()principal_remaing()

Lendername : textid : textcontact : textphone_number : number

Borrowername : textid : numberrisk : numberstatus : text

risk()set_status_good()set_status_late()set_status_default()borrower_status()set_status()

Bundleactive time period : dateprofit : numberestimated risk : numbertotal : numberloan analyst : id_numberdiscount_rate : numberinvestor_name : textdate_sold : date

risk()calculate_profit()cost()

Loan Arranger

rec_monthly_report()inv_request()generate reports()identify_report_format()verify_report()look_for_a_lender()look_for_a_loan()identify_loan_by_criteria()manually_select_loans()optimize_bundle()calculate_new_bundle()identify_asked_report()aggregate_bundles()aggregate_loans()aggregate_borrowers()aggregate_lenders()format_report()show_report()

Loan

amount : numberinterest rate : numbersettlement data : dateterm : datestatus : textoriginal_value : numberprincipal_original : number

risk()set_status_default()set_status_late()set_status_good()discount_rate()borrowers()principal_remaining()

1

1..*

1

1..*

1..*1..*

1..*1..*

1..*

0..1

1..*

0..1

Good

Late

monthly report informing payment on time[ payment time <= due time ]

receive a monthly report

Default

monthly report informing late payment[ payment time > due time + 10 ]

monthly report informing late payment[ due time < payment time < due time + 10 ]

monthly report informing late payment[ payment time > due time + 10 ]

monthly report informing payment on time[ payment time <= due time ]

Loan State Diagram

Fanny May : Loan Arranger

Borrower : Borrower

A Lender : Specified Lender

Loan : Loan

verify_report()

new_loan(lender, borrowers)

new_

look_for_a_lender(lender)

look_for_a_loan(loan)

look_for_a_

update_loan(lender, borrower)

update_

lender :

new_lender(name,contact, phone_number)

update(lender)

monthly_report(lender, loans, borrowers)

identify_report_format()

Receive Monthly Report

Specified Lender

Investor

Fanny May

Receive Reports

Monthly Report

Investment Request

Request

Generate Reports

Loan Analyst

Loan Arranger Classes Description

Class name: Fixed_Rate Loan Category: Logical View Documentation: A fixed rate loan has the same interest rate over the entire term of the mortgage

External Documents: Export Control: Public Cardinality: n Hierarchy: Superclasses: Loan Public Interface: Operations: risk principal_remaining

State machine: No Concurrency: Sequential Persistence: Persistent

Operation name: risk Public member of: Fixed_Rate Loan Return Class: float Documentation: take the average of the risks' sum of all borrowers related to this loan if the average risk is less than 1 round up to 1 else if the average risk is less than 100 round up to the nearest integer otherwise round down to 100 Concurrency: Sequential

Loan-Arranger Requirements Specification – Jan. 8, 1999

Background

Banks generate income in many ways, often by borrowing money from their depositorsat a low interest rate, and then lending that same money at a higher interest rate in theform of bank loans. However, property loans, such as mortgages, typically have terms of15, 25 or even 30 years. For example, suppose that you purchase a $150,000 house witha $50,000 down payment and borrow a $100,000 mortgage from National Bank forthirty years at 5% interest. That means that National Bank gives you $100,000 to pay thebalance on your house, and you pay National Bank back at a rate of 5% per year over aperiod of thirty years. You must pay back both principal and interest. That is, the initialprincipal, $100,000, is paid back in 360 installments (once a month for 30 years), withinterest on the unpaid balance. In this case the monthly payment is $536.82. Althoughthe income from interest on these loans is lucrative, the loans tie up money for a longtime, preventing the banks from using their money for other transactions. Consequently,the banks often sell their loans to consolidating organizations such as Fannie Mae andFreddie Mac, taking less long-term profit in exchange for freeing the capital for use inother ways.

The six software artifacts: requirement description, use cases and four design documents in UML:


Software Artifacts, with OO Reading Techniques (OORTs) indicated:

RequirementsDescriptions

Use-CasesRequirementsSpecification/Analysis

High LevelDesign

ClassDiagrams

ClassDescriptions

State Diagrams

SequenceDiagrams

OORT-1Vertical reading

Horizontal reading

OORT-6OORT-7

OORT-5

OORT-3OORT-2OORT-4


• OORT-1: Sequence Diagram x Class Diagram (horizontal, static)

• OORT-2: State Diagram x Class Description (horizontal, dynamic)

• OORT-3: State Diagram x Sequence Diagram (horizontal, dynamic)

• OORT-4: Class Diagram x Class Description (horizontal, static)

• OORT-5: Class Description x Requirement Description (vertical, static)

• OORT-6: Sequence Diagram x Use Case Diagram (vertical, dyn./stat.)

• OORT-7: State Diagram x Rqmt Descr. / Use Case Diagr. (vertical, dynamic)

• Abbreviations: Requirement Description (RD), Use Case Diagram (UC), Class Diagram (CD), Class Description (CDe) to supplement CD, State Diagram (StD), Sequence Diagram (SqD).

The seven OO Reading Techniques (OORTs)

Reading Techniques and defect types:

Domain Knowledge

Software (Design) Artifacts

Other Domain

General Requirements

ambiguity

extraneousincorrect fact

omission

inconsistency

• Software reading techniques try to increase the effectiveness of inspections by providing procedural guidelines that can be used by individual reviewers to examine (or “read”) a given software artifact (design doc.) and identify defects.

• As mentioned, empirical evidence that tailored software reading increases the effectiveness of inspections for many software artifacts, not just source code.

OO Reading and Defect Types

OO Reading and Defect Types (2)

Type of Defect DescriptionOmission One or more design diagrams that should contain some concept from

the general requirements or from the requirements document do notcontain a representation for that concept.

Incorrect Fact A design diagram contains a misrepresentation of a concept describedin the general requirements or requirements document.

Inconsistency A representation of a concept in one design diagram disagrees with arepresentation of the same concept in either the same or anotherdesign diagram.

Ambiguity A representation of a concept in the design is unclear, and could causea user of the document (developer, low-level designer, etc.) tomisinterpret or misunderstand the meaning of the concept.

ExtraneousInformation

The design includes information that, while perhaps true, does notapply to this domain and should not be included in the design.

Table 1 – Types of software defects, and their specific definitions for OO designs


• Omission (conceptually using vertical information) – “too little”: Ex. Forgot to consider no-coverage on credit cards, forgot a state transition.

• Extraneous or irrelevant information (most often vertical) – “too much”: Ex. Has included both gasoline and diesel sales.

• Incorrect Fact (most often vertical) – “wrong”: Ex. The maximum purchase limit is $1000, not $100.

• Inconsistency (most often horizontal) – “wrong”: Ex. Class name spelled differently in two diagrams, forgot to declare a class function/attribute etc.

• Ambiguity (most often horizontal) – “unclear”: Ex. Unclear state transition, e.g. how a gas pump returns to “vacant”.

• Miscellaneous: other kind of defects or comments.

Examples of defect types:

May also have defect severity: minor (comments), major, supermajor. Also IEEE STD on code defects: interface, initialization, sequencing, …


• Horizontal Reading, for internal consistency of a design:

• Ensure that all design artifacts represent the same system.

• Design contains complementary views of the information:

– Static (class diagrams)

– Dynamic (interaction diagrams)

• Not obvious how to compare these different perspectives.

• Vertical Reading, for traceability between reqmts/analysis and design:

• Ensure that the design artifacts represent the same system as described by the requirements and use-cases.

• Comparing documents from different lifecycle phases:

– Level of abstraction and detail are different

Horizontal vs. Vertical Reading:


Reader 1

Reader 2

Reader 3

looking for consistencyhorizontal reading

looking for consistencyhorizontal reading

looking for traceabilityvertical reading

Meet as a team to discuss a comprehensive defect list.Each reader is an “expert” in a different perspective

Final list of all defects sent to designer for

repairing

The design inspection process with OO reading techniques:

OO Reading-Related Concepts

• Levels of functionality in a design (used later in the OORTs):

– Functionality: high-level behavior of the system, usually from the user’s point of view. Often a use case. Ex. In a text editor: text formatting.

Ex. At a gas station: fill-up-gasoline and pay.

– Service: medium-level action performed internally by the system; an “atomic unit” out of which system functionalities are composed. Often a part of a use-case, e.g. a step in the pseudo-code. Ex. In a text editor: select text, use pull-down menus, change font selection. Ex. At a gas station: Transfer $$ from account N1 to N2, if there is coverage.

– Message (function): lowest-level behavior unit, out of which services and then functionalities are composed. Represents basic communication between cooperating objects to implement system behavior. Messages may be shown on sequence diagrams and must be defined in their respective classes. Ex. In a text editor: Write out on a character. Ex. At a gas station: Add $$ to customer bill: add_to_bill(customer, $$, date).

OO Reading-Related Concepts (2)

• Constraints/Conditions in requirements:

– Condition (i.e. local pre-condition): what must be true, before a functionality/service etc. can be executed. Example from GSCS’s 7. Payment: (see p.20-22) … If (payment time) is now, payment type must be by credit card or cash ... … If (payment time) is monthly, payment type must be by billing account ...

– Constraint (more global): must be always be true for some system functionality etc. Example from GSCS’s 9.2 Credit card problem: … The customer can only wait for 30 seconds for authorization from the Credit Card System. …

– Constraints can, of course, be used in conditions to express exceptions.– Both constraints and conditions can be expressed as notes in UML class / state /

sequence diagrams.

Ex. Gas Station Control System

• 1. Gas station: Sells gasoline from gas pumps, rents parking spots, has a cashier and a GSCS.

• 2. Gas pump: Gasoline is sold in self-service gas pumps. The pump has a computer display and keyboard connected to the GSCS, and similarly for a credit card reader. If the pump is vacant, the customer may dispense gasoline. He is assisted in this by the GSCS, who supervises payment (points 7-9), and finally resets the pump to vacant. Gasoline for up to $1000 can be dispensed at a time.

• 3. Parking spot: Regular customers may rent parking spots at the gas station. The cashier queries the GSCS for the next available parking spot, and passes this information back to the customer. See points 7-9 for payment.

• 4. Cashier: An employee of the gas station, representing the gas station owner. One cashier is on-duty at all time. The cashier has a PC and a credit card reader, both communicating with the GSCS. He can rent out parking spots, and receive payment from points 2 & 3 above, while returning a receipt.

Ex. Simplified requirement specification for a Gas Station Control System (GSCS), mainly for payment:

Ex. Gas Station Control System (2)

• 5. Customer May fill up gasoline at a vacant gas pump, rent a parking spot at a cashier, and pay at the gas pump (for gasoline) or at the cashier. Regular customers are employed in a local business, which is cleared for monthly billing.

• 6. GSCS – Keeps inventory of parking spots and gasoline, a register of regular customers and their businesses and accounts, plus a log of purchases. – Has a user interface at all gas pumps and at the cashier’s PC, and is connected to an external Credit Card System and to local businesses (via Internet). – Computes the price for gasoline fill-ups, informs the cashier about this, and can reset the gas pump to vacant. – Will assist in making payments (points 7-9).

• 7. Payment in general Payment time and type is selected by the customer. Payment time is either now or monthly: – If it is now, payment type must be by credit card or cash (incl. personal check). – If it is monthly, payment type must be by billing account to local business. There are two kind of purchase items: gasoline fill-up and parking spot rental. A payment transaction involves only one such item.


• 8. Payment type

• 8.1 By cash (or personal check): can only be done at the cashier.

• 8.2 By credit card: can be done either at the gas pump or at the cashier. The customer must swipe his credit card appropriately, but with no PIN code.

• 8.3 By billing account: the customer must give his billing account to the cashier, who adds the amount to the monthly bill of a given business account.

• 9. Payment exception

• 9.1 Cash (check) problem: The cashier is authorized to improvise.

• 9.2 Credit card problem: The customer can only wait for 30 seconds for authorization from the Credit Card System. If no response or incorrect credit card number / coverage, the customer is asked for another payment type / credit card. At the gas pump, only one payment attempt is allowed; otherwise the pump is reset to vacant (to not block the lane), and the customer is asked to see the cashier.

• 9.3 Business account problem: If the account is invalid, the customer is asked for another payment type / account number.


• What about no more gasoline, or no more parking spots?

• How should the user interface dialogs be structured?

• Are any credit card allowed, including banking cards (VISA etc.)?

• What kind of information should be transferred between gas pumps and the GSCS, between the cashier and the GSCS etc.?

• How to collect monthly payment from local businesses?

• How many payment attempts should be given to the customer at the cashier?

• What if the customer ultimately cannot pay?

Example, part 1: Possible weaknesses in GSCS requirements:

Can be found by special reading techniques for requirements, but this is outside our scope here.


• Example, part 2: Parking Spot related messages in a sequence diagram for Gas Station.

Customer :Customer

Gas Station Owner :Gas Station Owner

Credit Card System :Credit_Card System

Purchase :Purchase

Customer Bill :Bill

Parking Spot :Parking_Spot

parking_spot_request( account_number)next_available( )

where_to_park( available parking_spot)

lease_parking_spot(parking_spot, payment time, payment type)

authorize_payment(customer, amount, date)

[ Payment type = Credit Card and payment time = now]

new_purchase(customer, parking_spot, date)

add_to_bill(customer, amount, date)

[ payment time = monthly]new_payment_type_request()

[ response time => 30 secs orcredit card not authorized andpayment time = now]

[response time < 30 secs]


• Example, part 3: Abstracting messages to two services for Gas Station – GetParkingSpot (“dotted” lines) and PayParkingSpot (“whole” lines).

Customer :Customer

Gas Station Owner :Gas Station Owner

Credit Card System :Credit_Card System

Purchase :Purchase

Customer Bill :Bill

Parking Spot :Parking_Spot

parking_spot_request( account_number)next_available( )

where_to_park( avai lable parking_spot)

lease_parking_spot(parking_spot, payment time, payment type)

authorize_payment(customer, amount, date)

[ Payment type = Credit Card and payment time = now]

new_purchase(customer, parking_spot, date)

add_to_bill(customer, amount, date)

[ payment time = monthly]new_payment_type_request()

[ response time => 30 secs orcredit card not authorized andpayment time = now]

[response time < 30 secs]

Requirements for Gas Station Control System (7)

• Example, part 4: Checking whether a constraint is fulfilled in Gas Station class diagram:

Credit_Card System

+ authorize_payment(customer, amount, date)()

(from External Systems)

[ response time should be less than 30

seconds for all Credit Card Systems ]

OORT experimentation

• Receiving feedback from users of the techniques: Controlled Experiments Observational Studies

• Revising the techniques based on feedback: Qualitative (mostly) Quantitative

• Continually evaluating the techniques to ensure they remain feasible and useful

• Negotiating with companies to implement OORTs on real development projects.

• Goal: To assess effectiveness on industrial projects … Are time/effort requirements realistic? Do the techniques address real development needs?

• … using experienced developers. Is there “value added” also for more experienced software engineers?

Empirical Evaluations of OORTs

OORT experimentation (2)

• Techniques are feasible

• Techniques help find defects

• Vertical reading finds more defects of omission and incorrect fact

• Horizontal reading finds more defects of inconsistency and ambiguity

What we know:

What we don’t know:

• What influence does domain knowledge have on the reading process– Horizontal x Vertical

• Can we automate a portion of the techniques, e.g. by an XMI-based UML tool?– Some steps are repetitive and mechanical– Need to identify clerical activities

• See also conclusion.– http://www.cs.umd.edu/Dienst/UI/2.0/Describe/ncstrl.umcp/CS-TR-4070– http://fc-md.umd.edu/reading.html

OORT experimentation (3)

• Controlled Experiment I, Autumn 1998:

– Undergraduate Software Engineering class, UMD

– Goal: Feasibility and Global Improvement

• Observational Studies, FC-UMD, Summer 1999:

– Goal: Feasibility and Local Improvements

• Observational Studies II, UMD, Autumn 1999:

– Two Graduate Software Engineering Classes, UMD,

– Goal: Observation and Local Improvement

• Controlled Experiment III, Spring 2000:

– Undergraduate Software Engineering Class, UMD,

– Goal: General life-cycle study (part of larger experiment)

• And more at UMD…

• One Postmortem, Three Controlled Exper., one Feasibility Study, 2000-02:

- The E0-E4 at NTNU/Ericsson

Experiments so far:

E0: Post-mortem of SDL-inspections at Ericsson-Oslo, 1997-99

• Overall goals: Study existing inspection process, with SDL diagrams (emphasis) and PLEX programs. Possibly suggest process changes.

• Context:

– Inspection process for SDL in Fagan-style, adapted by Tom Gilb / Gunnar Ribe at Ericsson in early 90s.

But now UML, not as “linear” as SDL.

– Good company culture for inspections.

– Internal, file-based inspection database also with data for some test phases.

– One project release A with 20,000 person-hours.

– Further releases B-F with 100,000 person-hours.

• Post-mortem study:

– Three MSC students dug up and analyzed data.

– To learn and to test hypotheses, e.g. about role of complexity and persistent longitudinal relationships.

– Good support by local product line manager, Torbjørn Frotveit.

E0: Post-mortem of SDL-inspections at Ericsson-Oslo, 1997-99 (2)

• Main findings: (PROFES’99 and NASA-SEW’99 papers, chapter in Gilb-book’02)

– Three defect types: Supermajor, Major, (+ Comments) – and impossible to extend by us later.

– Effectiveness/efficiency: 70% of all defects caught in individual reading (1ph/defect), 6% in later meetings (8ph/defect).

Average defect efficiency in later unit&function testing: 8-10ph/defect.

– Actual inspection rate: 5ph/SDL-page, recommended rate: 8ph/SDL-page; implies that we could have found 50% more defects at 1/6 of later test costs.

– Release A: 1474 ph on inspections, saves net 6700 ph (34%) of 20,000 ph.

– Release B-F: 20,515 ph on insp., saves net 21,000 ph (21%) of 100,000 ph.

– Release A: no correlation between module complexity (#states) and number of defects found in inspection – i.e. puts the best people on the hardest tasks!

– Release B-F: many non-significant correlations on defect-proneness across phases and releases – but too little data on the individual module level.

• But: Lots of interesting data, but little local after-analysis, e.g. to understand and tune process. However, new process with Java/UML from 1999/2000.

E1: First OORT experiment, NTNU, Spring 2000

• Overall goals: Learn defect detection techniques and specially OORTs, check if the OORTs are feasible and receive proposals to improve them, compare discovered defects with similar experiments at Univ. Maryland (course CS735).

• Context: QA/SPI course, 19 students, 9 groups, pass/no-pass, T.Dingsøyr as TA.

• Process:

– Make groups: two students in each group (a pair), based on questionnaires. Half of the groups are doing OORTs 2,3,6,7 (mainly state diagrams), the other half OORTs 1,4,5,6 (mainly class/sequence diagrams).

– General preparation: Two double lectures on principles and techniques (8 h).

– Special preparation (I1): Look at requirements and guidelines (2h, self study).

– OO Design Reading (I2): Read and fill out defect/observ. reports (8h, paired); one group member executes the reading, the other is observing the first.

• Given documents: lecture notes w/ guidelines for OORT1-7 and observation studies, defect and observation report forms, questionnaires to form groups and resulting group allocation, set of “defect-seeded” software documents (RD, UC, CD, CDe, SqD, and StD) – either for Loan Arranger (LA) or Parking Garage (PG) example. Three markup pens.

E1: First OORT experiment, Ntnu, Spring 2000 (2)

• Differences with UMD-experiment Autumn 1999: – OORT instructions operationalized (as Qx.ij) for clarity and tracing by Conradi.

See Appendix for OORT-1.– LA is considered harder and more static, PG easier and more dynamic?– UMD and NTNU are different university contexts.

• Qualitative findings:– Big variation in effort, dedication and results:

E.g. some teams did not report effort data, even did the wrong OORTs.– Big variation in UML expertise.– Students felt frustrated by the extent of the assignment, and that indicated efforts

were too low -- felt cheated.– Lengthy and tedious pre-annotation of artifacts, before real defect detection

could start (”slave labor”). Discovered many defects already during annotation, even defects that remained unreported.

– OORTs too ”heavy” for the given (small) artifacts?– Some confusion about the assigments: what, how, on which artifacts, ...?– But also many positive and concrete comments.

E1: First OORT experiment, NTNU, Spring 2000 (3)

• Quantitative findings:

– Recorded defects and comments: Parking Garage: 17 (of 32) seeded defects & 3 more + 43 comments. Loan Arranger: 17 (of 30) seeded defects & 4 more + 44 comments.

– Defect and comm. occurr., 5 PG groups and 4 LA; sum, average, variance: PG: sum:33, 6 (4..10) & 1 (0..2) + sum:68, 14 (3..22) comments. LA: sum:52, 11 (7..14) & 2 (0..4) + sum:72, 18 (9..37) comments.

– Duplicately reported defects: PG: 11 of 13 duplicate defect occurrences found by different OORTs. LA: 12 of 31 duplicate defect occurrences found by different OORTs.

– Effort spent: (for 4 OORTs, counting one person per team) (Discrepancy = defect or comment.) PG: 6-7 person-hours, ca. 3 discrepancies/ph. LA: 10-13 person-hours, ca. 2.5 discrepancies/ph.

– Note: 2X more ”comments” than pure defects … and long arguments on what is what! A comment can go on details, as well as architecture.


• Quantitative findings (cont’d 2):

Defect/OORT types in PG inspection – for 33 defect occurrences

OORT \ Defect type

OORT-1

OORT-2

OORT-3

OORT-4

OORT-5

OORT-6

OORT-7

Sum all OORTs

Omission 1 3 6 1 3 1 1 16

Extraneous 1 1 2

Incorrect Fact 2 3 1 3 1 1 11

Ambiguity 1 1

Inconsistency 1 2 3

Miscellaneous -

Total 3 7 8 8 4 2 1 33

”Mixed” profile High High High Middle



Defect/OORT types in LA inspection – for 52 defect occurrences

OORT \ Defect type

OORT-1

OORT-2

OORT-3

OORT-4

OORT-5

OORT-6

OORT-7

Sum all OORTs

Omission 7 7

Extraneous -

Incorrect Fact 3 5 1 3 2 14

Ambiguity -

Inconsistency 20 2 7 2 31

Miscellaneous -

Total 23 7 1 10 9 - 2 52

”Static” profile Very H.Middle High High



Comment types/causes in PG inspection -- for 43 comments (not 68 occurrences)

Comment cause \ type

Missing Behav.

Miss. Attrib.

Typo / Spell.

System Border

ClarificationOther

Sum all causes

Omission 7 5 10 1 2 25

Extraneous -

Incorrect Fact 1 1 4 6

Ambiguity 2 3 5

Inconsistency 4 4

Miscellaneous 3 3

Total 8 5 4 12 8 6 43



Comment types/causes in LA inspection -- for 44 comments (not 72 occurrences)

Comment cause \ type

Missing Behav.

Miss. Attrib.

Typo / Spell.

System Border

Clarification Other

Sum all causes

Omission 16 1 3 20

Extraneous -

Incorrect Fact 1 1 5 7

Ambiguity 1 2 2 2 7

Inconsistency 3 3 6

Miscellaneous 3 1 4

Total 17 2 6 5 7 7 44


• Lessons:– Some unclear instructions: Executor/Observer role, Norwegian file names, file

access, some typos. First read RD?– Some unclear concepts: service, constraint, condition, …– UML: not familiar by some groups.– Technical comments on artifacts and OORTs:

Add comments/rationale to diagrams: UC and CD are too brief. CDe hard to navigate in -- add separators. SqD had method parameters, but CD not -- how to check? Need several artifacts (also RD) to understand some OORT questions. Many trivial typos and naming defects in the artifacts, by UML tool?:

Parking Garage artifacts need more work Fanny May = Loan Arranger? Lot = Parking Garage? LA vs. Loan Arranger vs. LoanArranger, gate vs. Gate,

CardReaders vs. Card_Readers. All relations in CDia had cardinalities reversed! …really frustrating to deal with poor-quality artifacts

E2: OORT feasibility study at NTNU, Autumn 2001

• Overall goals: Go through all OORT-related material to learn and propose changes, first for NTNU experiment and later for Ericsson experiment.

• Context: Two senior MSc students at NTNU, Hegde and Arif in Depth Project course (half semester), each repeating E1 as a combined executor/observer.

• Process:

– Repeat E1 process, but so that each executor is doing all OORTs on either LA or PG.

– Analysing data and suggesting future improvements based on both E1 and E2.

• Findings (next slide):

– Used about 15 hours each, 2X that of E1 groups – but did all 7 OORTs, not 4.

– 28 resp. 27 defects found in PG and LA: 3X as many per OORT as in E1.

– Found 9 more PG defects: 35+9 = 44, 4 more LA defects: 34+4 = 38.

– Found 34 more PG comm.: 43+34 = 77, 34 more LA comm.: 34+10 = 44.

– About 4 discrepancies/ph, 50% more than in E1.

– Many good suggestions for improvements on OORTs, artifacts, and process.

– So motivation means a lot!

E2: OORT feasibility study at NTNU, Autumn 2001 (2)


Results from PG/LA inspections – NB: #defects = #occurrences

Executor \ Background and results

PG (by Hegde, doing all OORTs)

LA (by Arif, doing all OORTs)

Industrial background Low-Med Low-Med

UML background Low Med-High

#defects recorded 28 (of 44, +9 new) 27 (of 38, +4 new)

#comments recorded 40 (w/ 34 new) 25 (w/ 10 new)

Total effort (in min.) 900 910

Effort per Discrepency (Defect+Comment) 13/min (4.5/ph) 17.5/min (3.8/ph)

E3: Second OORT experiment, NTNU, Spring 2002

• Overall goals: As in E1, but also to try out certain OORT changes for later industrial experiments in E4.

• Context: Sw Arch. course, 42 students, 21 groups, pass/no-pass, Hegde and Arif as TAs.

• Process:

– Mainly as in E1, but a web-application was used to manage artifacts and filled-in forms.

– OORTs enhanced for readability (not as terse as in E1), and generally polished and corrected.

• Given documents: Mostly as in E1, but trivial defects (typos, spellings etc.) in artifacts corrected to reduce the amount of “comments”.

E3: Second OORT experiment at NTNU, Spring 2002 (2)

• Changes in OORTs for E3 experiment, based on E1/E2 insights:

Change type Count

Error in original OORTs from UMD 3

Error in E1-conversion to Qx.ij 10

Question rephrased 20Comments added (more words) 22

Total 55

E3: Second OORT experiment at NTNU, Spring 2000 (3)

• Quantitative findings for PG:

The 11 groupsOORTsDefects seeded

Old defect occurr.

Old com-ment occur.

New com-ment occur.

Total discrep-encies

Effort (min)

Efficiency (disc./ph)

10 2,3,6,7 44 4 1 3 8 337 1.4211 2,3,6,7 44 4 0 4 8 370 1.3012 2,3,6,7 44 3 0 5 8 210 2.2913 2,3,6,7 44 4 9 23 36 870 2.4814 1,4,5,6 44 8 5 7 20 232 5.1715 1,4,5,6 44 5 11 8 24 445 3.2416 1,4,5,6 44 5 2 3 10 215 2.7917 1,4,5,6 44 7 7 7 21 230 5.4818 1,4,5,6 44 3 4 9 16 266 3.6119 1,4,5,6 44 6 10 11 27 320 5.0630 1,4,5,6 44 4 10 11 25 390 3.85

Sum (44)53 (21 defects)

59 (77 old)

91 (78 new) 203 3885

Mean 4.8 5.4 8.3 18.5 353.2 3.30Median 4 5 7 20 320 3.24

E3: Second OORT experiment, NTNU, Spring 2002 (4)

• Quantitative findings for LA:

The 10 groups OORTs

Defects seeded

Old defect occurr.

Old com-ment occur.

New com-ment occur.

Total discrep-encies

Effort (min)

Efficiency (disc./ph)

20 2,3,6,7 38 2 0 3 5 286 1.0521 2,3,6,7 38 3 4 9 16 245 3.9222 2,3,6,7 38 5 2 5 12 162 4.4423 2,3,6,7 38 3 0 17 20 270 4.4424 2,3,6,7 38 0 0 3 3 230 0.7825 1,4,5,6 38 11 9 11 31 385 4.8326 1,4,5,6 38 4 2 7 13 400 1.9527 1,4,5,6 38 14 10 17 41 280 8.9728 1,4,5,6 38 4 1 4 9 265 2.0429 1,4,5,6 38 11 13 11 35 270 7.7831 1,4,5,6 38 12 19 9 40 315 7.62

Sum (38)69 (21 defects)

60 (44 old)

96 (72 new) 225 3108

Mean 6.3 5.5 8.7 20.5 282.5 4.30Median 4 2 9 16 270 4.44

E3: Second OORT experiment, NTNU, Spring 2002 (5)

• Findings, general :

– PG: 4-5 defects found per group, totally found 21 (of 44) defects in 53 occurr., 150 comment occurrences with 78 new comments (*). Mean: 3.3 discr./ph.

– LA: 4-6 defects found per group, totally found 21 (of 38) defects in 69 occurr., 156 comment occurrences with 72 new comments (*). Mean: 4.3 discr./ph.

– OORT-2,3,6,7: found less defects that OORT-1,4,5,6.

– Efficiency: 3-4 discrepencies/ph, or ca. 1 in industrial context (3-4 inspectors).

– Group variance: 1:4 (for PG) and 1:11 (for LA) in #defects found and in #defects/ph. Motivation was rather poor and variable?

– Cleaning up artifacts for trivial defects (typos etc.): did not reduce number of comments, still 3X as many comments as defects!

*) New comments must be analyzed further, possibly new defects here also.

E4: Industrial OORT experiment at Ericsson-Grimstad, Spring 2002

• Overall goals: Investigate feasibility of new OORTs vs. old checklist/view-based reading techniques in an industrial environment at Ericsson-Grimstad.

• Context: – Ericsson site in Grimstad, ca. 400 developers, 250 doing GPRS work, 10

developers paid to perform experiment, part of INCO and PROFIT projects. Much internal turbulence in down-sizing in Spring 2002.

– Overall inspection process in place, need upgrade in individual techniques for UML-reading and generally better metrics.

– Hegde and Arif as TAs, supplemented with two local MSc students from local HiA, and NTNU PhD-student Mohagheghi from Ericsson as internal coord..

• Process:– First lecture with revised OORTs, then individual reading (preparation), then

common inspection meeting. Later data analysis by HiA/NTNU.– The OORTs were adapted for lack of CDe (OORT-4: CDxCDe) and RD (using

UC instead), e.g. no OORT-5: CDxRD.– Artifacts: as paper and on internal file catalogs, forms: on the web.

• Given documents: no RD (too big and with appendices), UC, CD (extended and also serving as CDe), one StD (made hastily from a DFD), and SqD. Only increments was to be inspected, according to separate list.

E4: Industrial OORT experiment at Ericsson-Grimstad, Spring 2002 (2)

• Quantative results from Ericsson-Grimstad:

Defects and effort

Partial baseline June 2001 - March 2002

Current view-based techn., 5 persons, May 02

Revised OORTs, 4 persons, May 02

#defect occur. by reading 84 17 (all different) 47 (39 different)

#defect occur. in meeting 82 8 1

Reading effort

99.92 ph

(0.84 def/ph)

10 ph

(1.70 def/ph)

25.5 ph

(2,29 def/ph)

Meeting effort

214.92 ph

(0.38 def/ph)

8,25 ph

(0.97 def/ph)

9 ph

0.11 (def/ph)

Total effort

314.84 ph

(0.53 def/ph)

18,25 ph

(1.37 def/ph)

34,5 ph

(1.57 def/ph)

E4: Industrial OORT experiment at Ericsson-Grimstad, Spring 2002

• Lessons learned in E4 at Ericsson:

– General: Reading (preparation) generally done too fast – already known. Weak existing baseline and metrics, but under improvement. Inspections to find defects; design reviews for “deeper” design comments

– OORTs vs. view-based reading/inspection: OORT data are for 4 inspectors, the 5th did not deliver his forms. OORTs perform a bit better than view-based in efficiency. But OORTs finds 2X defects. View-based technique finds 1/3 of total defects in inspection meeting! Need to cross-check duplicate defect occurrences,

but no overlap in defects found by view-based and OORTs! Both outperforms inspection “baseline” by almost 3X in efficiency. So many aspects to follow up!

Conclusion

• Lessons learned in general for E0-E4:

– Industrial artifacts are enormous: Requirements of 100s of pages, with ER/DF/Sq diagrams (not only text) Software artifacts cover entire walls – use increments and their CRs?

– Industrial baselines are thin, so hard to demonstrate a valid effect.

– OORTs seems to do the job, but: Still many detailed Qx.ij that are not answerable (make transition matrices) Some redundancies in annotations/questions + industry tailoring. What to do with “comments” (2-3X the defects)? Domain knowledge and UML expertise: does not matter?

– On using students: Ethics: slave labor and intellectual rights – many angry comments back. Norwegian students not so motivated as American – integrate OORTs

better in courses, use paid volonteers? A 20h-exercise is too large?

– Conclusions: Must do more experimentation in industry. How to use more realistic artifacts for student experiments.

OORT-1: Sequence Diagram x Class Diagram

• Inputs:

– 1. A class diagram, possibly in several packages.

– 2. Sequence diagrams.

• Outputs:

– 1. Annotated versions of above diagrams.

– 2. Discrepancy reports.

• Goal: To verify that the class diagram for the system describes classes and their relationships in such a way that the behaviors specified in the sequence diagrams are correctly captured.

• Instructions:

– Do steps R1.1 and R1.2.

Appendix: OORT-1: Sequence Diagram x Class Diagram, Spring 2000 version (E1 and E2)

OORT-1: Sequence Diagram x Class Diagram (2)

• Inputs:

– 1. Sequence diagram (SqD).

• Outputs:

– 1. System objects, classes and actors (underlined with blue on SqD);

– 2. System services (underlined with green on SqD);

– 3. Constraints/conditions on the messages/services (circled in yellow on SqD). I.e., a marked-up SqD is produced, and will be used in R1.2.

• Instructions: – matches outputs above.

– Q11.a: Underline system objects, classes and actors in blue on SqD.

– Q11.b: Underline system services in green on SqD.

– Q11.c: Circle constraints/conditions on messages/services in yellow on SqD.

Step R1.1: From a sequence diagram – identify system objects, system services, and conditions.


• Inputs:

– 1. Marked up sequence diagrams (SqDs) – from R1.1.

– 2. Class diagrams (CDs).

• Outputs:

– 1. Discrepancy reports.

• Instructions (as questions – here and after):

– Q12.a: Can every object/class/actor in the SqD be found in the CD? Possible [inconsistency?]

– Q12.b Can every service/message in the SqD be found in the CD, and with proper parameters? [inconsistency?]

– Q12.c: Are all system services covered by (low-level) messages in the SqD? Possible [omission?]

– Q12.d: Is there an association or other relationship between two classes in case of message exchanges? [omission?]

– Q12.e: Is there a mismatch in behavior arguments or in how constraints / conditions are formulated between the two documents? [inconsistency?]

Step R1.2: Check related class diagrams, to see if all system objects are covered.


• Step R1.2 instructions (cont’d ):

– Q12.f: Can the constraints from the SqD in R1.1 be fulfilled? E.g. Number of objects that can receive a message (check cardinality in CD)? E.g. Range of data values? E.g. Dependencies between data or objects? E.g. Timing constraints? Report any problems. [ inconsistency?]

– Q12.g: Overall design comments, based on own experience, domain knowledge, and understanding: E.g. Do the messages and their parameters make sense for this object? E.g. Are the stated conditions appropriate? E.g. Are all necessary attributes defined? E.g. Do the defined attributes/functions on a class make sense? E.g. Do the classes/attributes/functions have meaningful names? E.g. Are class relationships reasonable and of correct type? (ex. association vs. composition relationships). Report any problems. [incorrect fact?]