rainer schaden june 30, 2016 · 2016-07-13 · what is refactoring? É opdyke - phd thesis (1992)...

Post on 16-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Threats to the Validity of Refactoring Studies

Rainer SchadenFreie Universität Berlin

June 30, 2016

Outline

Introduction

How to find Refactorings?

Repository Mining Tools

Planned Approach

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 2

Outline

Introduction

How to find Refactorings?

Repository Mining Tools

Planned Approach

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 3

What is Refactoring?

É Opdyke - PhD thesis (1992)

É Fowler - Refactoring: Improving the Design of Existing Code (1999)

Definition“A change made to the internal structure of software to make it easier tounderstand and cheaper to modify without changing its observablebehavior”

— Martin Fowler

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 4

What is Refactoring?

É Opdyke - PhD thesis (1992)É Fowler - Refactoring: Improving the Design of Existing Code (1999)

Definition“A change made to the internal structure of software to make it easier tounderstand and cheaper to modify without changing its observablebehavior”

— Martin Fowler

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 4

What is Refactoring?

É Opdyke - PhD thesis (1992)É Fowler - Refactoring: Improving the Design of Existing Code (1999)

Definition“A change made to the internal structure of software to make it easier tounderstand and cheaper to modify without changing its observablebehavior”

— Martin Fowler

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 4

Why Refactoring?

É Prevents software aging

É Improves:É DesignÉ UnderstandabilityÉ MaintainabilityÉ Developer’s productivity

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 5

Why Refactoring?

É Prevents software agingÉ Improves:

É DesignÉ UnderstandabilityÉ MaintainabilityÉ Developer’s productivity

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 5

Refactoring Tactics

É Floss Refactoring:É FrequentÉ Mixed with other program changes

É Root-Canal Refactoring:É InfrequentÉ LengthyÉ Hardly any other program changes

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 6

Refactoring Tactics

É Floss Refactoring:É FrequentÉ Mixed with other program changes

É Root-Canal Refactoring:É InfrequentÉ LengthyÉ Hardly any other program changes

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 6

Goal of this Thesis

É Examine Refactoring studies with regards to:

É CredibilityÉ RelevanceÉ Validity

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 7

Goal of this Thesis

É Examine Refactoring studies with regards to:É Credibility

É RelevanceÉ Validity

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 7

Goal of this Thesis

É Examine Refactoring studies with regards to:É CredibilityÉ Relevance

É Validity

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 7

Goal of this Thesis

É Examine Refactoring studies with regards to:É CredibilityÉ RelevanceÉ Validity

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 7

Outline

Introduction

How to find Refactorings?

Repository Mining Tools

Planned Approach

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 8

How to find Refactorings?

É Goal: empirical refactoring data

É Four Methods: (Murphy-Hill)É Mining the Commit LogÉ Analyzing Code HistoriesÉ Observing ProgrammersÉ Logging Refactoring Tool Use

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 9

How to find Refactorings?

É Goal: empirical refactoring dataÉ Four Methods: (Murphy-Hill)

É Mining the Commit LogÉ Analyzing Code HistoriesÉ Observing ProgrammersÉ Logging Refactoring Tool Use

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 9

How to find Refactorings?

É Goal: empirical refactoring dataÉ Four Methods: (Murphy-Hill)

É Mining the Commit LogÉ Analyzing Code HistoriesÉ Observing ProgrammersÉ Logging Refactoring Tool Use

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 9

Mining the Commit Log

É Search for "refactor", "rename", etc. in commit messages

É Problems:É Bad commit messagesÉ Floss refactoringÉ Tendency towards certain, large refactorings

É Assumptions:É Developer recalls refactoringÉ And describes it accurately

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 10

Mining the Commit Log

É Search for "refactor", "rename", etc. in commit messagesÉ Problems:

É Bad commit messagesÉ Floss refactoringÉ Tendency towards certain, large refactorings

É Assumptions:É Developer recalls refactoringÉ And describes it accurately

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 10

Mining the Commit Log

É Search for "refactor", "rename", etc. in commit messagesÉ Problems:

É Bad commit messagesÉ Floss refactoringÉ Tendency towards certain, large refactorings

É Assumptions:É Developer recalls refactoringÉ And describes it accurately

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 10

Analyzing Code Histories

É Analyzing versions of source code

É Problems:É Manual inspection is slow and error-proneÉ Comparing non-consecutive versionsÉ Tools only recognize limited number of refactoringsÉ Tools use heuristics

É Assumptions:É Adequate granularity in code historyÉ Appropriate window of observationÉ Heuristics are parameterized correctly

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 11

Analyzing Code Histories

É Analyzing versions of source codeÉ Problems:

É Manual inspection is slow and error-proneÉ Comparing non-consecutive versionsÉ Tools only recognize limited number of refactoringsÉ Tools use heuristics

É Assumptions:É Adequate granularity in code historyÉ Appropriate window of observationÉ Heuristics are parameterized correctly

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 11

Analyzing Code Histories

É Analyzing versions of source codeÉ Problems:

É Manual inspection is slow and error-proneÉ Comparing non-consecutive versionsÉ Tools only recognize limited number of refactoringsÉ Tools use heuristics

É Assumptions:É Adequate granularity in code historyÉ Appropriate window of observationÉ Heuristics are parameterized correctly

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 11

Observing Programmers

É Direct observation: controlled experiment

É Indirect observation: surveyÉ Problems:

É Controlled experiments are expensiveÉ Bad external validity of controlled experimentsÉ Indirect observation relies on memory of developers

É Assumptions:É Developers recall refactorings accuratelyÉ Frequent refactoringsÉ External validity

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 12

Observing Programmers

É Direct observation: controlled experimentÉ Indirect observation: survey

É Problems:É Controlled experiments are expensiveÉ Bad external validity of controlled experimentsÉ Indirect observation relies on memory of developers

É Assumptions:É Developers recall refactorings accuratelyÉ Frequent refactoringsÉ External validity

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 12

Observing Programmers

É Direct observation: controlled experimentÉ Indirect observation: surveyÉ Problems:

É Controlled experiments are expensiveÉ Bad external validity of controlled experimentsÉ Indirect observation relies on memory of developers

É Assumptions:É Developers recall refactorings accuratelyÉ Frequent refactoringsÉ External validity

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 12

Observing Programmers

É Direct observation: controlled experimentÉ Indirect observation: surveyÉ Problems:

É Controlled experiments are expensiveÉ Bad external validity of controlled experimentsÉ Indirect observation relies on memory of developers

É Assumptions:É Developers recall refactorings accuratelyÉ Frequent refactoringsÉ External validity

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 12

Logging Refactoring Tool Use

É Use log files of automated refactoring tools

É Problems:É Requires extensive log filesÉ Can’t track manual refactorings

É Assumptions:É Developers use tools for refactoring

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 13

Logging Refactoring Tool Use

É Use log files of automated refactoring toolsÉ Problems:

É Requires extensive log filesÉ Can’t track manual refactorings

É Assumptions:É Developers use tools for refactoring

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 13

Logging Refactoring Tool Use

É Use log files of automated refactoring toolsÉ Problems:

É Requires extensive log filesÉ Can’t track manual refactorings

É Assumptions:É Developers use tools for refactoring

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 13

Strength and Weaknesses

É Implicit/ExplicitÉ Accuracy and PrecisionÉ ContextÉ Fidelity

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 14

Outline

Introduction

How to find Refactorings?

Repository Mining Tools

Planned Approach

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 15

Ref-Finder

É Prete et al. (2010)É Analyzing versions of source codeÉ Supports 63 refactoringsÉ Logic rules

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 16

Ref-Finder Description

É Input: Two versions of a program

É Basic factsÉ I.e. method(methodFullName, methodShortName, typeFullName)

É Deleted_ und added_ factsÉ Similarbody factsÉ Refactorings are rulesÉ Topological sort

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 17

Ref-Finder Description

É Input: Two versions of a programÉ Basic facts

É I.e. method(methodFullName, methodShortName, typeFullName)

É Deleted_ und added_ factsÉ Similarbody factsÉ Refactorings are rulesÉ Topological sort

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 17

Ref-Finder Description

É Input: Two versions of a programÉ Basic facts

É I.e. method(methodFullName, methodShortName, typeFullName)

É Deleted_ und added_ facts

É Similarbody factsÉ Refactorings are rulesÉ Topological sort

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 17

Ref-Finder Description

É Input: Two versions of a programÉ Basic facts

É I.e. method(methodFullName, methodShortName, typeFullName)

É Deleted_ und added_ factsÉ Similarbody facts

É Refactorings are rulesÉ Topological sort

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 17

Ref-Finder Description

É Input: Two versions of a programÉ Basic facts

É I.e. method(methodFullName, methodShortName, typeFullName)

É Deleted_ und added_ factsÉ Similarbody factsÉ Refactorings are rules

É Topological sort

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 17

Ref-Finder Description

É Input: Two versions of a programÉ Basic facts

É I.e. method(methodFullName, methodShortName, typeFullName)

É Deleted_ und added_ factsÉ Similarbody factsÉ Refactorings are rulesÉ Topological sort

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 17

Ref-Finder Example Rule

Example

É deleted_method(mFullName, mShortName, t1FullName) ∧

É added_method(newmFullName, mShortName, t2FullName) ∧É similar_body(newmFullName, newmBody, mFullName, mBody) ∧É NOT(equals(t1FullName, t2FullName))É → move_method(mShortName, t1FullName, t2FullName)

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 18

Ref-Finder Example Rule

Example

É deleted_method(mFullName, mShortName, t1FullName) ∧É added_method(newmFullName, mShortName, t2FullName) ∧

É similar_body(newmFullName, newmBody, mFullName, mBody) ∧É NOT(equals(t1FullName, t2FullName))É → move_method(mShortName, t1FullName, t2FullName)

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 18

Ref-Finder Example Rule

Example

É deleted_method(mFullName, mShortName, t1FullName) ∧É added_method(newmFullName, mShortName, t2FullName) ∧É similar_body(newmFullName, newmBody, mFullName, mBody) ∧

É NOT(equals(t1FullName, t2FullName))É → move_method(mShortName, t1FullName, t2FullName)

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 18

Ref-Finder Example Rule

Example

É deleted_method(mFullName, mShortName, t1FullName) ∧É added_method(newmFullName, mShortName, t2FullName) ∧É similar_body(newmFullName, newmBody, mFullName, mBody) ∧É NOT(equals(t1FullName, t2FullName))

É → move_method(mShortName, t1FullName, t2FullName)

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 18

Ref-Finder Example Rule

Example

É deleted_method(mFullName, mShortName, t1FullName) ∧É added_method(newmFullName, mShortName, t2FullName) ∧É similar_body(newmFullName, newmBody, mFullName, mBody) ∧É NOT(equals(t1FullName, t2FullName))É → move_method(mShortName, t1FullName, t2FullName)

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 18

Ref-Finder Problems: Missing Refactorings

É Nine of 72 refactorings are not detectedÉ Including “Substitute Algorithm”

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 19

Ref-Finder Problems: Wrong precision

É Precision =|E ∩ R||R|

É Recall = |E ∩ R||E|

Figure : Ref-Finder precision(Source: Prete et al.)

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 20

Ref-Finder Problems: Wrong precision

É Precision =|E ∩ R||R|

É Recall = |E ∩ R||E|

Figure : Ref-Finder precision(Source: Prete et al.)

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 20

Ref-Finder Problems: Wrong precision

É Precision =|E ∩ R||R|

É Recall = |E ∩ R||E|

Figure : Ref-Finder precision(Source: Prete et al.)

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 20

Ref-Finder Problems: Wrong precision

É Precision =|E ∩ R||R|

É Recall = |E ∩ R||E|

Figure : Ref-Finder precision(Source: Prete et al.)

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 21

Ref-Finder Problems: Questionable Recall

Recall:“Since it is hard to find known refactorings, we ran REF-FINDER usingsimilarity threshold σ = 0.65 and manually inspected randomly chosenrefactorings until we found 10 correct refactorings. We then measured arecall against this data set at a more reasonable threshold, σ = 0.85.”

— Prete et al.

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 22

UMLDiff

É Xing and Stroulia (2005)É Structural-differencing algorithmÉ Name-similarity and Structure-similarity heuristicsÉ Supports 33 Refactorings

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 23

RefactoringCrawler

É Dig et al. (2006)É Syntactic analysis using Shingles encodingÉ Semantic analysisÉ Supports seven Refactorings

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 24

Outline

Introduction

How to find Refactorings?

Repository Mining Tools

Planned Approach

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 25

Planned Approach

É Examine precision and recall of Ref-Finder

É Analyze results of studies using Ref-FinderÉ Examine precision and recall of other tools and methodsÉ Analyze results of other studies

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 26

Planned Approach

É Examine precision and recall of Ref-FinderÉ Analyze results of studies using Ref-Finder

É Examine precision and recall of other tools and methodsÉ Analyze results of other studies

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 26

Planned Approach

É Examine precision and recall of Ref-FinderÉ Analyze results of studies using Ref-FinderÉ Examine precision and recall of other tools and methods

É Analyze results of other studies

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 26

Planned Approach

É Examine precision and recall of Ref-FinderÉ Analyze results of studies using Ref-FinderÉ Examine precision and recall of other tools and methodsÉ Analyze results of other studies

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 26

Further possibilities: Time pressure andreleases

Recommendation“The other time you should avoid refactoring is when you are close to adeadline. At that point the productivity gain from refactoring would onlyappear after the deadline and thus be too late.”

— Martin Fowler

É Studies equate time pressure with major releasesÉ Questionable for Open-source software

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 27

Further possibilities: Time pressure andreleases

Recommendation“The other time you should avoid refactoring is when you are close to adeadline. At that point the productivity gain from refactoring would onlyappear after the deadline and thus be too late.”

— Martin Fowler

É Studies equate time pressure with major releasesÉ Questionable for Open-source software

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 27

Further possibilities: Bug-inducing vs.Bug-fix-inducing changes

É Are refactorings really responsible for bugs?É Or do they help finding bugs?

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 28

Summary

É There are four basic methods to gather empirical data aboutrefactoring

É Tools to detect refactorings are necessary but flawedÉ Goal: How valid are the results of refactoring studies?

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 29

For Further Reading I

Martin FowlerRefactoring: Improving the Design of Existing Code.Addison-Wesley Longman Publishing Co., Inc., 1999.

Murphy-Hill, Black, Dig, ParninGathering Refactoring Data: A Comparison of Four MethodsProceedings of the 2nd Workshop on Refactoring Tools, 7:1–7:5, 2008.

Prete, Rachatasumrit, Sudan, KimTemplate-based Reconstruction of Complex RefactoringsProceedings of the 2010 IEEE International Conference on SoftwareMaintenance, 1–10, 2010.

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 30

Thank You!

,

FU Berlin, Threats to the Validity of Refactoring Studies, June 30, 2016 31

top related