mining version histories to guide software changes

20
0/10 26th International Conference on Software Engineering (ICSE), Edinburgh, 28.05.2004 Mining Version Histories to Guide Software Changes Thomas Zimmermann (with Peter Weißgerber, Stephan Diehl, and Andreas Zeller) Lehrstuhl Softwaretechnik Universität des Saarlandes, Saarbrücken

Upload: thomas-zimmermann

Post on 29-Jun-2015

1.714 views

Category:

Technology


4 download

DESCRIPTION

Presented at ICSE 2004.

TRANSCRIPT

Page 1: Mining Version Histories to Guide Software Changes

0/10

26th International Conference on Software Engineering (ICSE), Edinburgh, 28.05.2004

Mining Version Historiesto Guide Software Changes

Thomas Zimmermann(with Peter Weißgerber, Stephan Diehl, and Andreas Zeller)

Lehrstuhl SoftwaretechnikUniversität des Saarlandes, Saarbrücken

Page 2: Mining Version Histories to Guide Software Changes

1/10

Extending ECLIPSE Preferences

Your task: Extend ECLIPSE with a new preference.

Page 3: Mining Version Histories to Guide Software Changes

1/10

Extending ECLIPSE Preferences

Your task: Extend ECLIPSE with a new preference.

Preferences are stored in field fKeys[]:

Page 4: Mining Version Histories to Guide Software Changes

2/10

Extending ECLIPSE Preferences

What else do you need to change?

Which of the 27,000 files20,000 classes200,000 methods of ECLIPSE?

Page 5: Mining Version Histories to Guide Software Changes

2/10

Extending ECLIPSE Preferences

What else do you need to change?

Which of the 27,000 files20,000 classes200,000 methods of ECLIPSE?

Program analysis.fKeys[] and initDefaults() use the same variables.

– Usage does not induce change.

– Usage can be detected only within program code.ECLIPSE has 12,000 non-JAVA files

Page 6: Mining Version Histories to Guide Software Changes

2/10

Extending ECLIPSE Preferences

What else do you need to change?

Which of the 27,000 files20,000 classes200,000 methods of ECLIPSE?

Program analysis.fKeys[] and initDefaults() use the same variables.

– Usage does not induce change.

– Usage can be detected only within program code.ECLIPSE has 12,000 non-JAVA files

Learning from history.Programmers who changed fKeys[] also changed…

Page 7: Mining Version Histories to Guide Software Changes

3/10

Guiding the Programmer

A) The user inserts anew preference into the field fKeys[]

B) ROSE suggestslocations for furtherchanges, e.g. thefunction initDefaults()

Page 8: Mining Version Histories to Guide Software Changes

4/10

From CVS to Transactions

The ECLIPSE CVS archive has more than 47,000 transactions.

���

Page 9: Mining Version Histories to Guide Software Changes

4/10

From CVS to Transactions

The ECLIPSE CVS archive has more than 47,000 transactions.

���

�����������������

������ ��������������

�������

��������������

������������������� ���

��� !�����������������

����������������

���������������� �����������

�����������

Page 10: Mining Version Histories to Guide Software Changes

5/10

Mining Association Rules

ROSE takes all transactions as input:

T42 = { fKeys[], initDefaults(), …, plugin.properties, …}T752 = { fKeys[], initDefaults(), …, plugin.properties, …}

T9872 = { fKeys[], initDefaults(), …, plugin.properties, …}T11386 = { fKeys[], initDefaults(), …}T20814 = { fKeys[], initDefaults(), …, plugin.properties, …}T30989 = { fKeys[], initDefaults(), …, plugin.properties, …}T41999 = { fKeys[], initDefaults(), …, plugin.properties, …}T47423 = { fKeys[], initDefaults(), …, plugin.properties, …}

...

Page 11: Mining Version Histories to Guide Software Changes

5/10

Mining Association Rules

ROSE takes all transactions as input:

T42 = { fKeys[], initDefaults(), …, plugin.properties, …}T752 = { fKeys[], initDefaults(), …, plugin.properties, …}

T9872 = { fKeys[], initDefaults(), …, plugin.properties, …}T11386 = { fKeys[], initDefaults(), …}T20814 = { fKeys[], initDefaults(), …, plugin.properties, …}T30989 = { fKeys[], initDefaults(), …, plugin.properties, …}T41999 = { fKeys[], initDefaults(), …, plugin.properties, …}T47423 = { fKeys[], initDefaults(), …, plugin.properties, …}

...

ROSE mines association rules from these transactions:

{ fKeys[], initDefaults() } ⇒ { plugin.properties }[Support 7, Confidence 7/8 = 0.875]

Page 12: Mining Version Histories to Guide Software Changes

6/10

Effective Mining

The classical association mining approach is to mine all rules:

– Helpful in understanding general patterns.

– Requires high support thresholds (>2n possible rules).

– Takes time to compute (3 days and more).

Page 13: Mining Version Histories to Guide Software Changes

6/10

Effective Mining

The classical association mining approach is to mine all rules:

– Helpful in understanding general patterns.

– Requires high support thresholds (>2n possible rules).

– Takes time to compute (3 days and more).

Alternative — mine only matching rules on demand:

Constraints on antecedent. Mine only rules which are relatedto the situation Σ, e.g. Σ⇒ X

Single consequent rules. Mine only rules which have asingleton as consequent, e.g. Σ⇒ {x}

Average runtime of a query: 0.5 seconds.

Page 14: Mining Version Histories to Guide Software Changes

7/10

Precision vs. Recall

What ROSE finds What it should find

False positives False negativesCorrect prediction

Precision How many of the returned entities are relevant?High precision = few false positives

Recall How many relevant entities are returned?High recall = few false negatives

Page 15: Mining Version Histories to Guide Software Changes

8/10

Evaluation

The programmer has changed one single entity.Can ROSE suggest other entities that should be changed?

Granularity EntitiesProject Recall Precision Top3

ECLIPSE 0.15 0.26 0.53GCC 0.28 0.39 0.89GIMP 0.12 0.25 0.91JBOSS 0.16 0.38 0.69JEDIT 0.07 0.16 0.52KOFFICE 0.08 0.17 0.46POSTGRES 0.13 0.23 0.59PYTHON 0.14 0.24 0.51Average 0.15 0.26 0.64

ROSE predicts 15% of all changed entitiesIn 64% of all transactions, ROSE’s topmost three suggestions

contain a correct entity

Page 16: Mining Version Histories to Guide Software Changes

8/10

Evaluation

The programmer has changed one single entity.Can ROSE suggest other entities that should be changed?

Granularity Entities FilesProject Recall Precision Top3 Recall Precision Top3

ECLIPSE 0.15 0.26 0.53 0.17 0.26 0.54GCC 0.28 0.39 0.89 0.44 0.42 0.87GIMP 0.12 0.25 0.91 0.27 0.26 0.90JBOSS 0.16 0.38 0.69 0.25 0.37 0.64JEDIT 0.07 0.16 0.52 0.25 0.22 0.68KOFFICE 0.08 0.17 0.46 0.24 0.26 0.67POSTGRES 0.13 0.23 0.59 0.23 0.24 0.68PYTHON 0.14 0.24 0.51 0.24 0.36 0.60Average 0.15 0.26 0.64 0.26 0.30 0.70

ROSE predicts 15% of all changed entities (files: 26%).In 64% of all transactions, ROSE’s topmost three suggestions

contain a correct entity (files: 70%).

Page 17: Mining Version Histories to Guide Software Changes

9/10

Challenges

Further Data Sources.Test outcomes, Mailing lists, Newsgroups, Chat logsHow do we leverage these sources?

Page 18: Mining Version Histories to Guide Software Changes

9/10

Challenges

Further Data Sources.Test outcomes, Mailing lists, Newsgroups, Chat logsHow do we leverage these sources?

Further Analyses.Program analysis, Sequence analysis, ClusteringHow do we integrate different analyses?

Page 19: Mining Version Histories to Guide Software Changes

9/10

Challenges

Further Data Sources.Test outcomes, Mailing lists, Newsgroups, Chat logsHow do we leverage these sources?

Further Analyses.Program analysis, Sequence analysis, ClusteringHow do we integrate different analyses?

From Locations to Actions.You have extended fKeys[] with UI_SPLINES;ROSE suggests:

Insert store.setDefaults(UI_SPLINES, false);in function initDefaults();

The user can accept this at the touch of one button.How much can we learn from history?

Page 20: Mining Version Histories to Guide Software Changes

10/10

Conclusion

★ ROSE detects coupling between non-program entities(e.g. programs and documentation).

★ ROSE effectively guides users along related changes.

★ In 64% of all transactions, ROSE’s topmost threesuggestions contain a correct entity (files: 70%).

★ Research has just begun to exploit non-program artefacts:

– Similar results by A. Ying (2004); A. Hassan (2004);and J. Sayyad-Shirabad (2003).

– ICSE Workshop on Mining Software Repositories, 2004.

★ ROSE will be available as an ECLIPSE plug-in in Fall 2004:

http://www.st.cs.uni-sb.de/softevo/