identifying reasons for software changes using historic databases
DESCRIPTION
Identifying Reasons for Software Changes Using Historic Databases. The CISC 864 Analysis By Lionel Marks. Purpose of the Paper. Using the textual description of a change, try to understand why that change was performed (Adaptive, Corrective, or Perfective) - PowerPoint PPT PresentationTRANSCRIPT
Identifying Reasons for Software Changes Using Historic Databases
The CISC 864 Analysis
By Lionel Marks
Purpose of the Paper
Using the textual description of a change, try to understand why that change was performed (Adaptive, Corrective, or Perfective)
Observe difficulty, size, and interval on the different types of changes
Three Different Types of Changes Traditionally, the three types of changes are
(Taken from ELEC 876 Slides):
Three Types of Changes in This Paper Adaptive: Adding new features wanted by the
customer (Switched with Perfective) Corrective: Fixing Faults Perfective: Restructuring code to
accommodate future changes (Switched with Adaptive)
They did not say why they changed these definitions
The Case Study Company
This paper did not divulge the company it used for its case study
It is an actual business Kept developer names/actions anonymous in
the study This allowed them to study a real system that
has lasted for many years, and has a large (and old) version control system.
Structure of the ECMS The Company’s Source Code Control System
- ECMS ( Extended Change Management System)
MRs vs. Deltas Each MR could have multiple Deltas of
changes to one file Delta – each time a file was “touched”
The Test System
Called “System A” for anonymity purposes Has:
2M lines of source code 3000 files 100 modules
Over the last 10 years: 33171 MRs An average of 4 deltas each
How they Classified Maintenance Activities (Adaptive, Corrective, Perfective)
If you were given this project You have:
The CVS repository, and access to the descriptions along with commits
The goal of labelling each commit as “Adaptive”, “Corrective”, or “Perfective”.
What would you intuitively study in the descriptions?
How they Classified Maintenance Activities (Adaptive, Corrective, Perfective)
They had a 5 step process:1. Cleanup and normalization
2. Word Frequency Analysis
3. Keyword Clustering and Classification
4. MR abstract classification
5. Repeat analysis from step 2 on unclassified MR abstracts
Step 1: Cleanup and Normalization Their approach used WordNet
A software that eliminates prefixes and suffixes to get back to the root word. E.g. fixing and fixes are all of the root word fix
WordNet also had a synonym feature, but it was not used.
They would be hard to correlate properly to the context of SW maintenance, and could be misinterpreted.
Step 2: Word Frequency Analysis Determine the frequency of a set of words in
the descriptions (Histogram for each description)
What words in the English language would be “neutral” to these classifications and be noise in this experiment?
Step 3: Keyword Clustering Classification was done by reading the
description of 20 randomly selected changes for each selected term in their set, such as “cleanup” meaning perfective maintenance. Human reading was done.
If word matched less than 75% of cases, then deemed “neutral”
Found that “rework” was used a lot during “code inspection” (a new classification)
Step 4: MR Classification Rules Like the “hard-coded” answer when the
learning algorithm fails If an inspection word is found, then it is
deemed an inspection classification If fix, bug, error, fixup, or fail are present, the
change is corrective If more than one type of keyword is present,
the dominating frequency wins.
Step 5: Cycle Back to Step 2
As in Step 2 you cannot cover the frequency of every word in your document all at once, take some more now
Perform more “learning” and see if new frequent terms fit
Use static rules to resolve unclassified descriptions
When all else failed, considered fixes to be corrective
Case Study: Compare Against Human Classification 20 Candidates, 150 MRs More than 61% of the time, the tool and the real people
came to the same classification Kappa and ANOVA were used to show significance in
the results
How Purposes Affect Size and Interval Corrective and Adaptive
had the lowest change intervals
New Code Development and inspection changes added the most lines
Inspection deleted the most lines
Distribution functions are significant at a 0.01 level ANOVA described significance as well, but is inappropriate due skewed distributions
Change Difficulty
20 Candidates, 150 MRs Goal: To model the difficulty of each MR. Is
classification significant?
Modeling Difficulty
Modeling of Size: Deltas (# of files touched) Difficulty changed with number of deltas except in
corrective and perfective (changes in SW/HW) changes Length of time modeled in difficulty as well
Likes and Dislikes of this Paper Likes
The algorithm used to make classifications – good way to break down the problem
The accumulation graphs were interesting Their utilization of a real company is also a breath of fresh
air – real data! Dislikes
Asking developers months after the work how hard changes were. No better way at moment, but results can be skewed with time.
Using a real company, the anonymity made the product comparison in the paper less interesting