raffaella settimi school of cti in collaboration with dr. jane huang and phd student xuchang (cindy)...

Raffaella Settimi

School of CTI

In collaboration with Dr. Jane Huang and PhD student Xuchang (Cindy) Zou

Funded by NSF grant #: CCR-0306303

Automated Requirements Traceability: an application of IR techniques in Software Engineering

Outline Requirements traceability: motivation

and challenges. Poirot: the tracemaker © Probabilistic methods for automated

tracing Extensions to the basic algorithm to

improve precision of results Future work: What makes a software

system traceable?

Requirements Traceability “Requirements traceability is the ability to

describe and follow the life of a requirement, in both a forward and backward direction, i.e. from its origins, through its development and specification, to its subsequent deployment and use, and through periods of ongoing refinement and iteration in any of these phases.”

(Gotel and Finkelstein, 1994.)

Requirements Traceability - Goals A generated software system satisfies

the specified requirements, All requirements have been

implemented by the end of the lifecycle, To analyze the impact of proposed

changes on the system. To support change management and

impact analysis in evolving software systems.

The ChallengeTraceability structures such as matrices are VERY HARD to maintain. Too many traceability links result in an unwieldy tangle of useless information. Many current traceability techniques do not provide sufficient ‘typing’ to support automation – reducing the value of the traceability effort.Current approaches require intensive human effortTraceability is a hard sell because it is often perceived to have an insufficient ROI.

Challenges ctd..

Objects change and evolve as the system is developed (40-90% of development costs)

Different levels of traceability – typically determined by project levels directives Coarsed-grained Fine-grained

6

Dynamic Generation of Traceability Links Goal: identify and retrieve set of artifacts in a

software system impacted by new requirements on an “as-needed” basis.

Action: dynamically generate a traceability scheme to identify directly and indirectly related artifacts.

Method: identifying potential links applying Information Retrieval (IR) techniques.

8

9

/* * Notification_Processing.java * @author Fuhu Liu * @version 5.0 * Changed log file to log DB */ import java.awt.*; import java.awt.event.*; import javax.swing.*; import java.io.*; import java.util.*; import java.sql.*; .... void UpdateDisplayList() { listModel.removeAllElements(); String mSQL = "SELECT distinct” + “ SubscriberName FROM EventDetails"; try { rs = stmt.executeQuery(mSQL); while (rs.next()) { String SubsName = rs.getString(1); listModel.addElement(SubsName); } rs.close(); } catch (Exception e) { System.out.println("Notification_Processing: Problem with query: " + e); } }

An example of traceable artifacts and selected links

Automated Requirements Traceability The analyst searches through the software

artifacts in the system to identity the ones that satisfies a certain requirement or will be potentially impacted by any change in this requirement.

Automated traceability methods decreases the effort needed to construct and maintain a set of traceability links and by providing traceability across a much broader set of documents.

Equivalent to “googling” a requirement and search through the software artifacts.

Software artifacts such as requirements, design documents and source code contain large amount of textual information.IR techniques can be used to search for links between software artifacts.Dynamic trace retrieval alleviates problems related to explicit trace creation and maintenance.

Trace Retrieval

/* * Notification_Processing.java * @author Fuhu Liu * @version 5.0 * Changed log file to log DB */ import java.awt.*; import java.awt.event.*; import javax.swing.*; import java.io.*; import java.util.*; import java.sql.*; .... void UpdateDisplayList() { listModel.removeAllElements(); String mSQL = "SELECT distinct” + “ SubscriberName FROM EventDetails"; try { rs = stmt.executeQuery(mSQL); while (rs.next()) { String SubsName = rs.getString(1); listModel.addElement(SubsName); } rs.close(); } catch (Exception e) { System.out.println("Notification_Processing: Problem with query: " + e); } }

Modern Information retrieval: An overview IR systems rank documents by estimating their

relevance to a user query A numeric score (similarity index) that measures

the similarity in content between a document and user query is assigned to each document.

Results are measured by

12

recall = proportion of correctly retrieved links out of all correct links precision = proportion of correctly retrieved links out of all retrieved links.

Some approachesSeveral research prototypes such as: POIROT (Probabilistic approach)(http://golevka.cstcis.cti.depaul.edu/Poirot) IWPSE’04, ICSE’05, IEEE Computer’07 (Cleland-Huang, Settimi, et al)RETRO (Vector Space Model)RE’03, RE’04. (Hayes, Dektar et al.)Latent Semantic Indexing

In general at recall levels of 90-95% precision levels are around 10-40%.

http://golevka.cstcis.cti.depaul.edu/Poirot

Several Datasets for testing Hard to find…we have 7 software projects, including IBS (Ice-breaker System) describes the requirements and

design of a public-works department system for managing roads de-icing. 164 requirements and 71 UML classes.

PACIS (Public Address/Customer Information Screens) is a large-scaled industrial dataset designed for the New York subway system. Links from the design documents (SSDD) on hardware and software components to software requirement specifications (SRS) specified as use cases and features.

IBS dataset 164 Requirements – for instance

71 UML Class diagrams: Class Diagram: System Maintenance

9064 The system shall be maintained on a regular schedule.

9065 An alert shall be issued for maintenance scheduling time.

Automated Tracing Tools

Links returned to the user

Links not presented to the user

KeysTrue link

False link

Software artifact pairs in decreasing order of similarity score

Candidate links list

discarded links list

thresholding

Keywords All documents are

preprocessed 1) Stop words are eliminated2) Words are stemmed to

their roots (remove plurals, past tenses, etc…)

3) Documents become vectors of stemmed keywords.

q

t1 t2 t3 t4 t5 t6

d1 d2 d3

For instance the document d1 = “I love the Machine Learning course” is reduced to d1 = (love, machine, learn, course)

A Probabilistic Network is a directed acyclic graph (DAG)

The probability value p(dj|q) can be used to measure the relevance of document dj to the query q.

The probability value p(d|q) depends on the terms co-occurring in a document d and in the query q.

Probabilistic Network Models

q

t1 t2 t3 t4 t5 t6

d1 d2 d3

Probabilistic model in IR Measure structural dependencies between

artifacts through probability values.

Efficient estimation of conditional probabilities for artifacts in terms of their lexical /semantic structures.

Enable us to incorporate additional information

Highly adaptable to dynamic changes in the system

19

An introduction to Bayesian Networks A graphical representation of multivariate

distributions (typically in high-dimensions) Intuitive interface for modeling:

Facilitates communication between data analysts and general public.

Helps design of new models and elicitation of experts knowledge

Efficient computations for parameter estimation and model selection

Modularity allows local computations Bayesian framework: learning probabilities from

data and prior experience/knowledge.

20

21

Bayes nets use DAG’s to represent the set of conditional independencies among the variables of the system.

Nodes ~ variables Missing Arc ~ conditional independence

X1 X3 X4

X2 X5

Directed Acyclic graphs

DAG encodes set of conditional independence assumptions. This permits a factorization of the joint probability distribution

Each variable is c.i. of its non-descendants given its parents

DAG’s cont.d

))(|(),,,,(5

154321 ii

ixpaxpxxxxxp

22X1 X3 X4

X2 X5

Parents of Xi

A Probabilistic Network is a directed acyclic graph (DAG)

The probability value p(dj|q) can be used to measure the relevance of document dj to the query q.

Probabilistic Network Models

q

t1 t2 t3 t4 t5 t6

d1 d2 d3

The probability value p(d|q) depends on the terms co-occurring in a document d and in the query q.

Estimating the Probability of a Link

Prob(dj|q) = prob(dj|ti) prob(q,ti) i

prob(q)Probability of a link between a query and document.

Frequency of term ti in respect to the size of the document. prob(dj|ti)=freq(dj,ti)/ k freq(dj,tk)

Frequency of term ti in respect to the query and the inverse of ni = # of docs containing ti: prob(q,ti)=freq(q,ti)/ni

Terms appearing in fewer documents provide stronger information about a certain concept.

i prob(q,ti)

Typical Results

DePaul University

Data Set Recall PrecisionIce Breaker 90.4% 31.7%EBT 90.8% 18.3%Light Control 90.1% 36.8%Siemens L&ARich SUCs to BUCs

90% 31%

Effective tracing tools must obtain high recall. Previous results in trace retrieval reveals low

precision problems. Precision is typically lower than 25% when recall of 90% is achieved.

Proposed enhancement strategies, including incorporating hierarchical structure information, utilizing project glossary and soliciting user feedback, usually have certain limitations.

Problems

Low Confidence Links

Probability

Freq

uenc

y

Links Non-links

0

0.05

0.1

0.15

0.2

0.8

00.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

9 0.1 0.110.1

20.1

30.1

40.1

50.1

60.1

70.1

80.1

9 0.2 0.21

0.220.2

30.2

40.2

50.2

60.2

70.2

80.2

9 0.3

High confidence that probabilities in this area constitute true links.

High confidence that probabilities in this area constitute NON-links.

Low confidence area.

Hierarchical data provides contextual information that can be used to improve retrieval results.

Term coverage: extent to which terms in the query co-occur in the searched document

Phrasing: phrases are considered more accurate at describing the content of a document and therefore can help improve the accuracy of the text retrieval

Model Enhancements

..

ThresholdLow

User evaluates candidate links and accepts or rejects them.

Non-links discarded and not presented to user

ThresholdHigh

Threshold Enhanced

Low confidencelinks. Sentto enhance-ment strategy.

Enhancedconfidencelinks. Sentto user.

Enhancedconfidenceto rejectlinks.

High confidence to reject link.

Enhance- ment strategies

BasicRetrieval High

confidencelinks.presentedto the user for evaluation

Key:True link that should be retrieved.

False link that should NOT be retrieved.

Correct links bubble up.

Incorrect links remain below the heightened threshold

Enhancement strategies aimed at low confidence area:

Hierarchical dataR1

R2 R3

R4

C3C2C1Ancestral lin

k (L1)

increases the probability of

descendent links such as L2.

Link L2

Requirement:Services shall be scheduled along pre-defined routes.

Requirement SubGroup:De-icing

Class:Route planner

P2

C6C5C4

Class:Scheduler

Link L3

Package:De-Icing Scheduler

P1

Package:Truck Maintenance

R5

Hypothesis: Hierarchical data provides contextual information that can be used to improve retrieval results.

Hierarchical Algorithm (Two tier model- document side)

Prob(dj|q) = prob(dj,g|ti) prob(q,ti)g paD(dj)

prob(q)Probability of a link between a query and document.

Frequency of term ti in respect to the document and ancestors of the document)

Frequency of term ti in respect to its occurrence in the query and the query’s ancestors.

i prob(q.ti)

i

Hierarchical terms

Prob(dj,g|ti) Freq(dj,ti)

Frequency of term ti in respect to the document and ancestors of the document)

Frequency of term ti in respect to the size of the document prob(q|ti)=freq(q,ti)/ k freq(q,tk)

Frequency of term ti in respect to the size of the ancestral document normalized over all ancestral documents.

kFreq(dj,tk)

+ D g paD(dj)

Freq(g,ti)

k

Freq(g,tk)

Weighting factors

Results – Hierarchical DataRetrieval strategy

IBS EBT LC

Recall Prec-ision

Recall Prec-ision

Recall Prec-ision

Basic 90.47% 20.43% 90.97% 17.75% 90.11% 37.61%

95.01% 16.81% 95.13% 16.36% 93.40% 32.57%

Hierarchical applied to entire dataset

90.25% 20.02% 89.58% 12.02% 90.11% 31.78%

95.01% 18.14% 95.14% 12.18% 95.6% 30.53%

Basic + Hierarchical applied to low confidence links

90.48% 31.72% 90.87% 18.30% 90.11% 36.77%

95.69% 25.65% 95.83% 17.42% 95.6% 30.53%

Significant improvement in precision

Reduced precision but ability to reach 95% recall.

Minorimprovement in precision

Definition: The extent to which terms in query q co-occur in document d.

Hypothesis: Higher query term coverage between a query and a document increases the likelihood of the two artifacts being related.

Enhancement Factor 1: ---- Query Term Coverage

tm

dqC ),(# of matching terms occur in both q and d

Total # of terms in q

Term coverage exampleRequirement

Class Diagram: System Maintenance

9065 An alert shall be issued for maintenance scheduling time.

Observation of retrieval results reveals a significant number of false positives in candidate links list containing very few distinct matching terms.

A false positive example from Ice-Breaker System:

The basic PN does not capture term coverage concept and may generate similar probability values for the two links in the following case:

Query Term Coverage -- Motivation

Road section shall be added.

TutorialSection {getTutorialSection()setTutorialSection()}

Requirement

UML class

…A…B…C

…A…A…A

Queryd2…A…B…C

d1

*assume terms A, B and C have the same weight

link 1

link 2

Query Term Coverage – MotivationAre false positives more likely to have low term coverage?

Link Type Average query term coverage

True positives 0.481

False positives 0.298

Candidate links list

Top true positives

Top false positives

Comparison of coverage on IBS. Top-100 true positives VS Top-100 false positives

KeysTrue link

False link

In general IR context, phrases are considered more accurate in specifying a document than a single word and therefore increase retrieval accuracy.

The IBS false positive example also shows how single term matching may retrieve false links:

Enhancement Factor 2: Phrases

Phrases are automatically detected using part-of-speech tagger to search through whole document collection.

Road section shall be added.TutorialSection {

getTutorialSection()setTutorialSection()}

Requirement

UML class

Incorporating enhancement factorsEnhanced probability PTC(d|q) by integrating

query term coverage:

mqdpifmqdpifqdpm

qdpTC /1)|(1/1)|()|(

)|(

m = # of matching terms occur in both q and d

Incorporating enhancement factors

Enhanced probability PPH(d|q) by integrating phrases

Enhanced probability by integrating both query term coverage and phrases:Replace P(q|d) with PPH(q|d) in one of the three term coverage methods

)(

),()|(

)|()|(qp

tqptdp

qdpqdp PHi Stifif

PH

Probability from basic PN

Term ti belongs to matching phrases found in q and d

Evaluation on IBS dataset (Ice-Breaker System): 164 requirements; 71 UML classes; 420 true links;

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

A B C A B Cbasic PN 20.0%basic+phrasing 21.0%basic+coverage 25.1% 26.6% 28.9%basic+phrasing+coverage 25.0% 27.1% 28.9%

Precision

DePaul University

Precision Comparison of algorithms at recall 90%

Basic+Coverage Basic+Phrasing+Coverage

Applying query term coverage significantly increases precision; Method C of query term coverage approach achieves the highest improvement; Applying both phrasing and coverage methods gains additional 9% precision-on-top than coverage method along.

Evaluation on IBS datasetPrecision on top 5% candidate links list

Applying phrasing and term coverage algorithm increases precision of the top links by 30%.

Probability value change for top 100 true links and false links

Basic PN Basic PN + phrasing +coverage

Applying phrasing and term coverage algorithm separates true and false links more clearly therefore help user distinguish between them more easily

0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%

100.00%

Top 5% precision 60.50% 81.50% 90.50%

Basic PN Basic+Coverage Basic+Coverage+prasing

00.10.20.30.40.50.60.70.80.9

1

0 20 40 60 80 100

True links False links

00.10.20.30.40.50.60.70.80.9

1

0 20 40 60 80 100

True links False links

Higher precision of the top links increases user’s trust on the traceability tools.

Application of enhancement strategies

High confidence links representing

true links

Low confidencelinks representing

false links

Enhanced links representing

true links

Non- links discarded and not presented

to user

Keys

True link

False link

Basic PN

Phrasing

Query Term Coverage

Project Glossary

OtherEnhancement

Strategies

Threshold

Enhanced result

Evaluation on PACIS dataset PACIS (Public Address/Customer Information Screen); Trace from SSDD(System/Subsystem Design Description) to SRS; SSDD contains 2403 components; SRS contains 245

requirements; matrix incomplete;Precision of top 100 links based on manual evaluation Applying both

phrasing and query term coverage approaches retrieves the most true links (73 out of 100).

More true links have been pushed up to the top after applying enhancement algorithms.

0%

20%

40%

60%

80%

100%

Precision based onmanual evaluation

30% 53% 69% 73%

Basic PN Basic + Phrasing

Basic + Coverage

Basic+Phrasing + Coverage

Evaluation on PACIS datasetTop 100 candidate links using basic PN and enhanced algorithm

Basic PN Basic PN + Phrasing + Coverage

After applying enhanced algorithm, probability of 46 true links were increased more significantly than false links. More true links have risen to top of the candidate links list, suggesting enhanced algorithm improves the retrieval results.

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100True links False links

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100True links False links

Conclusions

Success of enhancement factors is highly dependent upon the qualities of the dataset.

Once a training set is available, the enhancement factors can be evaluated and appropriate ones selected.

Future work: It might be possible to evaluate suitability of enhancement factors without a training set.

Future work What makes a collection of system

documentation traceable? Study on various characteristics of

requirements documentation and software artifacts documents to discover “trace-ability” factors!