raffaella settimi school of cti in collaboration with dr. jane huang and phd student xuchang (cindy)...
DESCRIPTION
Requirements Traceability “Requirements traceability is the ability to describe and follow the life of a requirement, in both a forward and backward direction, i.e. from its origins, through its development and specification, to its subsequent deployment and use, and through periods of ongoing refinement and iteration in any of these phases.” (Gotel and Finkelstein, 1994.)TRANSCRIPT
Raffaella Settimi
School of CTI
In collaboration with Dr. Jane Huang and PhD student Xuchang (Cindy) Zou
Funded by NSF grant #: CCR-0306303
Automated Requirements Traceability: an application of IR techniques in Software Engineering
Outline Requirements traceability: motivation
and challenges. Poirot: the tracemaker © Probabilistic methods for automated
tracing Extensions to the basic algorithm to
improve precision of results Future work: What makes a software
system traceable?
Requirements Traceability “Requirements traceability is the ability to
describe and follow the life of a requirement, in both a forward and backward direction, i.e. from its origins, through its development and specification, to its subsequent deployment and use, and through periods of ongoing refinement and iteration in any of these phases.”
(Gotel and Finkelstein, 1994.)
Requirements Traceability - Goals A generated software system satisfies
the specified requirements, All requirements have been
implemented by the end of the lifecycle, To analyze the impact of proposed
changes on the system. To support change management and
impact analysis in evolving software systems.
The ChallengeTraceability structures such as matrices are VERY HARD to maintain. Too many traceability links result in an unwieldy tangle of useless information. Many current traceability techniques do not provide sufficient ‘typing’ to support automation – reducing the value of the traceability effort.Current approaches require intensive human effortTraceability is a hard sell because it is often perceived to have an insufficient ROI.
Challenges ctd..
Objects change and evolve as the system is developed (40-90% of development costs)
Different levels of traceability – typically determined by project levels directives Coarsed-grained Fine-grained
6
Dynamic Generation of Traceability Links Goal: identify and retrieve set of artifacts in a
software system impacted by new requirements on an “as-needed” basis.
Action: dynamically generate a traceability scheme to identify directly and indirectly related artifacts.
Method: identifying potential links applying Information Retrieval (IR) techniques.
8
9
/* * Notification_Processing.java * @author Fuhu Liu * @version 5.0 * Changed log file to log DB */ import java.awt.*; import java.awt.event.*; import javax.swing.*; import java.io.*; import java.util.*; import java.sql.*; .... void UpdateDisplayList() { listModel.removeAllElements(); String mSQL = "SELECT distinct” + “ SubscriberName FROM EventDetails"; try { rs = stmt.executeQuery(mSQL); while (rs.next()) { String SubsName = rs.getString(1); listModel.addElement(SubsName); } rs.close(); } catch (Exception e) { System.out.println("Notification_Processing: Problem with query: " + e); } }
An example of traceable artifacts and selected links
Automated Requirements Traceability The analyst searches through the software
artifacts in the system to identity the ones that satisfies a certain requirement or will be potentially impacted by any change in this requirement.
Automated traceability methods decreases the effort needed to construct and maintain a set of traceability links and by providing traceability across a much broader set of documents.
Equivalent to “googling” a requirement and search through the software artifacts.
Software artifacts such as requirements, design documents and source code contain large amount of textual information.IR techniques can be used to search for links between software artifacts.Dynamic trace retrieval alleviates problems related to explicit trace creation and maintenance.
Trace Retrieval
/* * Notification_Processing.java * @author Fuhu Liu * @version 5.0 * Changed log file to log DB */ import java.awt.*; import java.awt.event.*; import javax.swing.*; import java.io.*; import java.util.*; import java.sql.*; .... void UpdateDisplayList() { listModel.removeAllElements(); String mSQL = "SELECT distinct” + “ SubscriberName FROM EventDetails"; try { rs = stmt.executeQuery(mSQL); while (rs.next()) { String SubsName = rs.getString(1); listModel.addElement(SubsName); } rs.close(); } catch (Exception e) { System.out.println("Notification_Processing: Problem with query: " + e); } }
Modern Information retrieval: An overview IR systems rank documents by estimating their
relevance to a user query A numeric score (similarity index) that measures
the similarity in content between a document and user query is assigned to each document.
Results are measured by
12
recall = proportion of correctly retrieved links out of all correct links precision = proportion of correctly retrieved links out of all retrieved links.
Some approachesSeveral research prototypes such as: POIROT (Probabilistic approach)(http://golevka.cstcis.cti.depaul.edu/Poirot) IWPSE’04, ICSE’05, IEEE Computer’07 (Cleland-Huang, Settimi, et al)RETRO (Vector Space Model)RE’03, RE’04. (Hayes, Dektar et al.)Latent Semantic Indexing
In general at recall levels of 90-95% precision levels are around 10-40%.
Several Datasets for testing Hard to find…we have 7 software projects, including IBS (Ice-breaker System) describes the requirements and
design of a public-works department system for managing roads de-icing. 164 requirements and 71 UML classes.
PACIS (Public Address/Customer Information Screens) is a large-scaled industrial dataset designed for the New York subway system. Links from the design documents (SSDD) on hardware and software components to software requirement specifications (SRS) specified as use cases and features.
IBS dataset 164 Requirements – for instance
71 UML Class diagrams: Class Diagram: System Maintenance
9064 The system shall be maintained on a regular schedule.
9065 An alert shall be issued for maintenance scheduling time.
Automated Tracing Tools
Links returned to the user
Links not presented to the user
KeysTrue link
False link
Software artifact pairs in decreasing order of similarity score
Candidate links list
discarded links list
thresholding
Keywords All documents are
preprocessed 1) Stop words are eliminated2) Words are stemmed to
their roots (remove plurals, past tenses, etc…)
3) Documents become vectors of stemmed keywords.
q
t1 t2 t3 t4 t5 t6
d1 d2 d3
For instance the document d1 = “I love the Machine Learning course” is reduced to d1 = (love, machine, learn, course)
A Probabilistic Network is a directed acyclic graph (DAG)
The probability value p(dj|q) can be used to measure the relevance of document dj to the query q.
The probability value p(d|q) depends on the terms co-occurring in a document d and in the query q.
Probabilistic Network Models
q
t1 t2 t3 t4 t5 t6
d1 d2 d3
Probabilistic model in IR Measure structural dependencies between
artifacts through probability values.
Efficient estimation of conditional probabilities for artifacts in terms of their lexical /semantic structures.
Enable us to incorporate additional information
Highly adaptable to dynamic changes in the system
19
An introduction to Bayesian Networks A graphical representation of multivariate
distributions (typically in high-dimensions) Intuitive interface for modeling:
Facilitates communication between data analysts and general public.
Helps design of new models and elicitation of experts knowledge
Efficient computations for parameter estimation and model selection
Modularity allows local computations Bayesian framework: learning probabilities from
data and prior experience/knowledge.
20
21
Bayes nets use DAG’s to represent the set of conditional independencies among the variables of the system.
Nodes ~ variables Missing Arc ~ conditional independence
X1 X3 X4
X2 X5
Directed Acyclic graphs
DAG encodes set of conditional independence assumptions. This permits a factorization of the joint probability distribution
Each variable is c.i. of its non-descendants given its parents
DAG’s cont.d
))(|(),,,,(5
154321 ii
ixpaxpxxxxxp
22X1 X3 X4
X2 X5
Parents of Xi
A Probabilistic Network is a directed acyclic graph (DAG)
The probability value p(dj|q) can be used to measure the relevance of document dj to the query q.
Probabilistic Network Models
q
t1 t2 t3 t4 t5 t6
d1 d2 d3
The probability value p(d|q) depends on the terms co-occurring in a document d and in the query q.
Estimating the Probability of a Link
Prob(dj|q) = prob(dj|ti) prob(q,ti) i
prob(q)Probability of a link between a query and document.
Frequency of term ti in respect to the size of the document. prob(dj|ti)=freq(dj,ti)/ k freq(dj,tk)
Frequency of term ti in respect to the query and the inverse of ni = # of docs containing ti: prob(q,ti)=freq(q,ti)/ni
Terms appearing in fewer documents provide stronger information about a certain concept.
i prob(q,ti)
Typical Results
DePaul University
Data Set Recall PrecisionIce Breaker 90.4% 31.7%EBT 90.8% 18.3%Light Control 90.1% 36.8%Siemens L&ARich SUCs to BUCs
90% 31%
Effective tracing tools must obtain high recall. Previous results in trace retrieval reveals low
precision problems. Precision is typically lower than 25% when recall of 90% is achieved.
Proposed enhancement strategies, including incorporating hierarchical structure information, utilizing project glossary and soliciting user feedback, usually have certain limitations.
Problems
Low Confidence Links
Probability
Freq
uenc
y
Links Non-links
0
0.05
0.1
0.15
0.2
0.8
00.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
9 0.1 0.110.1
20.1
30.1
40.1
50.1
60.1
70.1
80.1
9 0.2 0.21
0.220.2
30.2
40.2
50.2
60.2
70.2
80.2
9 0.3
High confidence that probabilities in this area constitute true links.
High confidence that probabilities in this area constitute NON-links.
Low confidence area.
Hierarchical data provides contextual information that can be used to improve retrieval results.
Term coverage: extent to which terms in the query co-occur in the searched document
Phrasing: phrases are considered more accurate at describing the content of a document and therefore can help improve the accuracy of the text retrieval
Model Enhancements
..
ThresholdLow
User evaluates candidate links and accepts or rejects them.
Non-links discarded and not presented to user
ThresholdHigh
Threshold Enhanced
Low confidencelinks. Sentto enhance-ment strategy.
Enhancedconfidencelinks. Sentto user.
Enhancedconfidenceto rejectlinks.
High confidence to reject link.
Enhance- ment strategies
BasicRetrieval High
confidencelinks.presentedto the user for evaluation
Key:True link that should be retrieved.
False link that should NOT be retrieved.
Correct links bubble up.
Incorrect links remain below the heightened threshold
Enhancement strategies aimed at low confidence area:
Hierarchical dataR1
R2 R3
R4
C3C2C1Ancestral lin
k (L1)
increases the probability of
descendent links such as L2.
Link L2
Requirement:Services shall be scheduled along pre-defined routes.
Requirement SubGroup:De-icing
Class:Route planner
P2
C6C5C4
Class:Scheduler
Link L3
Package:De-Icing Scheduler
P1
Package:Truck Maintenance
R5
Hypothesis: Hierarchical data provides contextual information that can be used to improve retrieval results.
Hierarchical Algorithm (Two tier model- document side)
Prob(dj|q) = prob(dj,g|ti) prob(q,ti)g paD(dj)
prob(q)Probability of a link between a query and document.
Frequency of term ti in respect to the document and ancestors of the document)
Frequency of term ti in respect to its occurrence in the query and the query’s ancestors.
i prob(q.ti)
i
Hierarchical terms
Prob(dj,g|ti) Freq(dj,ti)
Frequency of term ti in respect to the document and ancestors of the document)
Frequency of term ti in respect to the size of the document prob(q|ti)=freq(q,ti)/ k freq(q,tk)
Frequency of term ti in respect to the size of the ancestral document normalized over all ancestral documents.
kFreq(dj,tk)
+ D g paD(dj)
Freq(g,ti)
k
Freq(g,tk)
Weighting factors
Results – Hierarchical DataRetrieval strategy
IBS EBT LC
Recall Prec-ision
Recall Prec-ision
Recall Prec-ision
Basic 90.47% 20.43% 90.97% 17.75% 90.11% 37.61%
95.01% 16.81% 95.13% 16.36% 93.40% 32.57%
Hierarchical applied to entire dataset
90.25% 20.02% 89.58% 12.02% 90.11% 31.78%
95.01% 18.14% 95.14% 12.18% 95.6% 30.53%
Basic + Hierarchical applied to low confidence links
90.48% 31.72% 90.87% 18.30% 90.11% 36.77%
95.69% 25.65% 95.83% 17.42% 95.6% 30.53%
Significant improvement in precision
Reduced precision but ability to reach 95% recall.
Minorimprovement in precision
Definition: The extent to which terms in query q co-occur in document d.
Hypothesis: Higher query term coverage between a query and a document increases the likelihood of the two artifacts being related.
Enhancement Factor 1: ---- Query Term Coverage
tm
dqC ),(# of matching terms occur in both q and d
Total # of terms in q
Term coverage exampleRequirement
Class Diagram: System Maintenance
9065 An alert shall be issued for maintenance scheduling time.
Observation of retrieval results reveals a significant number of false positives in candidate links list containing very few distinct matching terms.
A false positive example from Ice-Breaker System:
The basic PN does not capture term coverage concept and may generate similar probability values for the two links in the following case:
Query Term Coverage -- Motivation
Road section shall be added.
TutorialSection {getTutorialSection()setTutorialSection()}
Requirement
UML class
…A…B…C
…A…A…A
Queryd2…A…B…C
d1
*assume terms A, B and C have the same weight
link 1
link 2
Query Term Coverage – MotivationAre false positives more likely to have low term coverage?
Link Type Average query term coverage
True positives 0.481
False positives 0.298
Candidate links list
Top true positives
Top false positives
Comparison of coverage on IBS. Top-100 true positives VS Top-100 false positives
KeysTrue link
False link
In general IR context, phrases are considered more accurate in specifying a document than a single word and therefore increase retrieval accuracy.
The IBS false positive example also shows how single term matching may retrieve false links:
Enhancement Factor 2: Phrases
Phrases are automatically detected using part-of-speech tagger to search through whole document collection.
Road section shall be added.TutorialSection {
getTutorialSection()setTutorialSection()}
Requirement
UML class
Incorporating enhancement factorsEnhanced probability PTC(d|q) by integrating
query term coverage:
mqdpifmqdpifqdpm
qdpTC /1)|(1/1)|()|(
)|(
m = # of matching terms occur in both q and d
Incorporating enhancement factors
Enhanced probability PPH(d|q) by integrating phrases
Enhanced probability by integrating both query term coverage and phrases:Replace P(q|d) with PPH(q|d) in one of the three term coverage methods
)(
),()|(
)|()|(qp
tqptdp
qdpqdp PHi Stifif
PH
Probability from basic PN
Term ti belongs to matching phrases found in q and d
Evaluation on IBS dataset (Ice-Breaker System): 164 requirements; 71 UML classes; 420 true links;
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
A B C A B Cbasic PN 20.0%basic+phrasing 21.0%basic+coverage 25.1% 26.6% 28.9%basic+phrasing+coverage 25.0% 27.1% 28.9%
Precision
DePaul University
Precision Comparison of algorithms at recall 90%
Basic+Coverage Basic+Phrasing+Coverage
Applying query term coverage significantly increases precision; Method C of query term coverage approach achieves the highest improvement; Applying both phrasing and coverage methods gains additional 9% precision-on-top than coverage method along.
Evaluation on IBS datasetPrecision on top 5% candidate links list
Applying phrasing and term coverage algorithm increases precision of the top links by 30%.
Probability value change for top 100 true links and false links
Basic PN Basic PN + phrasing +coverage
Applying phrasing and term coverage algorithm separates true and false links more clearly therefore help user distinguish between them more easily
0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%
100.00%
Top 5% precision 60.50% 81.50% 90.50%
Basic PN Basic+Coverage Basic+Coverage+prasing
00.10.20.30.40.50.60.70.80.9
1
0 20 40 60 80 100
True links False links
00.10.20.30.40.50.60.70.80.9
1
0 20 40 60 80 100
True links False links
Higher precision of the top links increases user’s trust on the traceability tools.
Application of enhancement strategies
High confidence links representing
true links
Low confidencelinks representing
false links
Enhanced links representing
true links
Non- links discarded and not presented
to user
Keys
True link
False link
Basic PN
Phrasing
Query Term Coverage
Project Glossary
OtherEnhancement
Strategies
Threshold
Enhanced result
Evaluation on PACIS dataset PACIS (Public Address/Customer Information Screen); Trace from SSDD(System/Subsystem Design Description) to SRS; SSDD contains 2403 components; SRS contains 245
requirements; matrix incomplete;Precision of top 100 links based on manual evaluation Applying both
phrasing and query term coverage approaches retrieves the most true links (73 out of 100).
More true links have been pushed up to the top after applying enhancement algorithms.
0%
20%
40%
60%
80%
100%
Precision based onmanual evaluation
30% 53% 69% 73%
Basic PN Basic + Phrasing
Basic + Coverage
Basic+Phrasing + Coverage
Evaluation on PACIS datasetTop 100 candidate links using basic PN and enhanced algorithm
Basic PN Basic PN + Phrasing + Coverage
After applying enhanced algorithm, probability of 46 true links were increased more significantly than false links. More true links have risen to top of the candidate links list, suggesting enhanced algorithm improves the retrieval results.
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100True links False links
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100True links False links
Conclusions
Success of enhancement factors is highly dependent upon the qualities of the dataset.
Once a training set is available, the enhancement factors can be evaluated and appropriate ones selected.
Future work: It might be possible to evaluate suitability of enhancement factors without a training set.
Future work What makes a collection of system
documentation traceable? Study on various characteristics of
requirements documentation and software artifacts documents to discover “trace-ability” factors!