assessing and improving rules to support software evolution andré cavalcante hora directeurs :...
TRANSCRIPT
Assessing and Improving Rules to Support Software Evolution
André Cavalcante HoraDirecteurs :
Stéphane Ducasse, Inria Lille Nord Europe, FranceNicolas Anquetil, Université Lille 1, France
2
Outline
1. Context and Motivation2. Problems3. Benefits of System-specific Rules4. Supporting System-specific Conventions with
Automatic Extraction5. Supporting Client Systems with Automatic
Extraction6. Impact of Software Evolution on Ecosystems7. Conclusion
3
Software Evolution (1)
In software development, change is the only constant
New features, bug-fixes, source code refactoring
Up to 80% of the development cost
Part of software ecosystems
4
Software Evolution (2)
In Eclipse, 42% of the methods in version 1.0 were not in 2.0 [MWZ11]
– Clients are clearly impacted
In Pharo, API evolution affected thousands of clients [RLR12]
– A minority of these clients were aware and reacted
5
Software Evolution (3)
How can we ensure software evolution propagation?
In summary:– Software evolution is a complex task– Aggravated in software ecosystems– Developers need help to maintain their systems
6
Ensuring Software Evolution
Software evolution is difficult!
Good practices In practiceBackward-compatibility ✖
[BTF05, DJ06, WGAK10]
Deprecation with helpful messages ✖[RLR12]
Use of only public APIs ✖[BR06,DR08,Bus13,BSvdB13]
7
Ensuring Consistency (1)
Rules are used to ensure source code consistency– Bug detection [LZ05, WH05, KPW06, NNP+10, SSPR12]
– Best coding practices and conventions [HP04, RBJ97, RDGN10]
– Class and method replacement [DR08, SJM08, WGAK10, MWZM12]
8
Ensuring Consistency (2)
The literature provides two solutions to obtain these rules:1. Manually created rules2. Automatically created rules
• Normally validated in small-scale case studies
There are lacks in each solution to support developers in their maintenance tasks effectively
9
Outline
1. Context and Motivation2. Problems3. Benefits of System-specific Rules4. Supporting System-specific Conventions with
Automatic Extraction5. Supporting Client Systems with Automatic
Extraction6. Impact of Software Evolution on Ecosystems7. Conclusion
10
Manually created rules (1)
Generic rules– Created by “non-experts”– Valid for distinct systems– Present in static analysis tools (FindBugs, PMD, SmallLint)
System-specific rules– Created by experts on the system under analysis– Focus on important problems– Defined for each system– Costly
SmallLint
11
Manually created rules (2)
Goal: improve code quality
Bug prevention:– Generic rules: do not prevent bugs [WDA+08, BvdB06, CMSV12, BM08, BM09]
– System-specific rules: not yet investigated
Problem 1:Due to their cost, are system-specific rules worthwhile?
12
Manual → Automatic
Apart from being costly:– Experts may not have the complete knowledge– Experts may no longer be available
It is important to reduce the dependence on experts
→ Second solution: automatically created rules
13
Automatically Created Rules
Extracted from software repositories
Inferred from systematic source code changes
Two researches focus by existing studies:1. System-specific analysis2. Supporting API evolution
14
Automatically Created Rules(System-specific Analysis)
Existing studies– Bug-fix changes [KPW06, NNP+10]
– Rule-fix changes [KE07]
– Comparing the diff between two releases [MWZ11, LZ05]
Revision 1 Revision 2 Revision n
Bug-fix change Bug-fix DB
Rule-fix change Rule-fix DB
Release 1 Release 2
Systematic changesSystematic
changes
15
Automatically Created Rules(System-specific Analysis)
Source code usage conventions spread over time Apache Ant: InputStream.close() → FileUtils.close(*)
Problem 2: System-specific conventions are not extracted from software repositories
6 years
revisions
Rev 10 Rev 50 Rev 70 Rev 90 Rev n
16
Automatically Created Rules(Supporting API Evolution)
Existing studies– Evolution rules [DR08, SJM08, WGAK10, MWZM12]
• Mapping: 1-to-1, 1-to-n, m-to-1, m-to-n• Type: replacement or suggestion• Example: m1() → m2()
revisions
Release 1 Release 2
replacement
m1()✖ m2()✚ m1() → m2()
suggestionm1()m2() m1() → m2()
m1()m2()
17
Automatically Created Rules(Supporting API Evolution)
Problem 3:Not all mapping cases are covered
Mapping
1-to-1 1-to-n, n-to-1, m-to-n
Repl. Sugg. Repl. Sugg.
[DR08] yes yes no no
[SJM08] yes yes no no
[WGAK10] yes no yes no
[MWZM12] yes no yes no
18
API Evolution → Ecosystem Analysis
Approaches to support API evolution are often evaluated in small-scale case studies
As we explained before:– A software system is often part of a bigger ecosystem
19
Software Ecosystem Analysis (1)
Software ecosystems studies– Social aspects [JSW11]
– Dependency between projects [LRL10, BCP+13]
– Visualization [LLGR10]
– Evolution size [GBRM+09]
– Natural ecosystems [MCG14, MCGS14, CMG14]
20
Software Ecosystem Analysis (2)
Software ecosystem impact analysis:– To better understand the real extension of the impact– To understand how the impact can be alleviated
Existing studies:– API stability (Android ecosystem, 10 projects) [MRK13]
– API deprecation (Smalltalk ecosystem, 2,600 projects) [RLR12]
Problem 4:The impact of API evolution on ecosystems is unknown
21
Problems and Contributions
1. Deep understanding of the benefits provided by system-specific rules– We investigate whether such rules are worthwhile enforcing [HADA12]
2. Improvement of automatically extracted rules to support system-specific conventions and API evolution– We propose two solutions to extract better rules from software
repositories [HADV13,HEA+14,HAE+14b,HAE+14a]
3. Analysis of the impact of software evolution on clients– We report on an investigation on the awareness of an ecosystem
about software evolution [HRA+14]
22
Outline
1. Context and Motivation2. Problems3. Benefits of System-specific Rules4. Supporting System-specific Conventions with
Automatic Extraction5. Supporting Client Systems with Automatic
Extraction6. Impact of Software Evolution on Ecosystems7. Conclusion
23
Benefits of System-specific Rules(violations vs. bugs)
Lines of code levelIdentifying lines with:– Bugs– Violations
revisions
method v1 method v2
Fixed Issue #123
1. Identify bug-fix commit
2. Calculate the diff
bugbug
3. Markbugs
violationviolation
with bug without bug
with violation TP FP
without violation FN TN
violation + bug
24
Experiment Results (RQ1)
RQ1 Is there a relation between generic violations and bugs?– H0 Generic violations and bugs are independent– Ha Generic violation and bugs are related– p-value = 0.65, precision = 5%
with bug without bug totalwith violation 55 1,063 1,118
without violation 609 12,689 13,298total 664 13,752 14,416
✔
25
Experiment Results (RQ2)
RQ2 Is there a relation between system-specific violations and bugs?
– H0 System-specific violations and bugs are independent– Ha System-specific violations and bugs are related– p-value < 0.001, precision = 12%
with bug without bug totalwith violation 37 275 312
without violation 627 13,477 14,104total 664 13,752 14,416
✔
26
Generic vs. Specific Rules: Summary (1)
RQ1 Generic rules are not effective enough to be used for bug prevention– Confirming existing studies
RQ2 System-specific rules are more relevant to avoid bugs– Effective to be used for bug prevention
27
Generic vs. Specific Rules: Summary (2)
System-specific rules are worthwhile, but…– They are costly, created by experts– Experts may not have the complete knowledge– Experts may no longer be available
Next: solutions to extract rules from software repositories, reducing the involvement of experts
28
Outline
1. Context and Motivation2. Problems3. Benefits of System-specific Rules4. Supporting System-specific Conventions with
Automatic Extraction5. Supporting Client Systems with Automatic
Extraction6. Impact of Software Evolution on Ecosystems7. Conclusion
29
Supporting System-specific Conventions with Automatically Created Rules
Describing API change conventions as rules:– Incremental revisions– Predefined rule patterns that ensure their quality– Occurrence over distinct revisions
Rule category: recommendation
Found in static analysis tools– Support migration (PMD)– Suggest the use of better APIs (FindBugs)– Improve code legibility (SmallLint)
Apache Ant: InputStream.close() → FileUtils.close(*)
SmallLint
30
Mining System-specific Rules (Overview)
revisions
Rev 1 …Rev 2 Rev 3 Rev n
1. Extracting deltas
deltas
Create rules(apply pattern)
candidate rules
Filter rules rulespatterns
2. Discovering rules
31
1. Extracting Deltas
Format of the changes
deleted-call(context-id, receiver, signature, static) added-call(context-id, receiver, signature, static)deltas
Diff of method foo() between version 1 and 2− rpackage = RPackageOrganizer.default();+ rpackage = RPackage.organizer();
Formatted changesdeleted-call(“foo()-rev2”, “RPackageOrganizer”, “default()”, “isStatic”)added-call(“foo()-rev2”, “RPackage”, “organizer()”, “isStatic”)
32
2. Discovering Rules (Creating Rules)
Example 1: FileDirectory.default() → FileSystem.workingDirectory()
Example 2: obj.noMoreNotificationsFor() → obj.unsubscribe()
Pattern 1: Change receiver and signature, static
pattern1(deletedReceiver, deletedSignature, addedReceiver, addedSignature) =
deleted-call(id, deletedReceiver, deletedSignature, “isStatic”) and
added-call(id, addedReceiver, addedSignature, “isStatic”)
Pattern 2: Change signature, same receiver, non-static
pattern2(deletedSignature, addedSignature) =
deleted-call(id, receiver, deletedSignature, “notStatic”) and
added-call(id, receiver, addedSignature, “notStatic”)
33
2. Discovering Rules (Filtering Rules)
Assess the rules which are likely to be system-specific conventions– We keep rules that occur over different revisions
revisions
Rev 10 Rev 50 Rev 70 Rev 90 Rev n
34
Experiment Setting (1)
RQ1 Are the rules correct to the expert?– With the help of an expert– Rule correctness: correct or incorrect
Delta databaseVersions Pharo 1.4 to 2.0Timeframe April 2012 to March 2013
Revisions 435 #Extracted rules for pattern … total1 2 3 4 5
Candidates rules 25 31 49 304 17 426Filtered rules (≥ 2 revisions)
14 11 3 13 4 45
35
Experiment Results (RQ1)
RQ1 Are the rules correct to the expert?
Example: RPackageOrganizer.default() → RPackage.organizer()
Patterntotal
1 2 3 4 5
Filtered rules 14 11 3 13 4 45
Correct rules 11 (79%) 6 (55%) 1 (33%) 7 (54%) 3 (75%) 28 (62%)
RPackageOrganizer.default() {"WARNING: Since this component can be changed (i.e. for testing),
you should NOT use it directly. Use Rpackage.organizer() instead" . . .
36
System-specific conventions:Summary
RQ1 62% (28 out of 45) of the rules were correct
Java validation (Tomcat, Ant, Lucene):– Two programming languages covered
Lack as to better support client systemsNext, supporting client developers
37
Outline
1. Context and Motivation2. Problems3. Benefits of System-specific Rules4. Supporting System-specific Conventions with
Automatic Extraction5. Supporting Client Systems with Automatic
Extraction6. Impact of Software Evolution on Ecosystems7. Conclusion
38
Supporting Client Systems with Automatically Created Rules
Extracting API evolution rules:– Incremental revisions– Data-mining technique– Formats:
• Mapping: 1-to-1, 1-to-n, n-to-1, m-to-n• Type: replacement and suggestion
39
Mining API Change Rules (Overview)
revisions
Rev 1 …Rev 2 Rev 3 Rev n
1. Extracting deltas
deltas
Select deltas selected deltas
Create rules (data mining)
rules
2. Discovering rules
40
1. Extracting Deltas from Revisions
Format of the changes
deleted-call(context-id, signature) added-call(context-id, signature)deltas
Diff of method foo() between version 1 and 2− self.add(MooseModel.root().add(model));+ self.add(model.install());
Formatted changesdeleted-call(“foo()-rev2”, “MooseModel.root()”)
deleted-call(“foo()-rev2”, “MooseModel.add(MooseModel)”)
added-call(“foo()-rev2”, “MooseModel.install()”)
41
2. Discovering Rules
Request:
1st step: selecting deltas2nd step: discovering rules
foo() a() bar() c()
d() e() f()
a() d() foo() e() bar()
k() foo() n()
foo() bar()
deleted
added
foo() bar()→ Confidence = 75%
n() Confidence = 25%
42
Experiment Setting (1)
Research questions– RQ1 Are the generated rules valid to experts?– RQ2 Which types of rules are generated?
Experiment design (with the help of an expert)– RQ1 Rule validity
• valid, invalid, or don’t know
– RQ2 Rule types• 1-to-1, 1-to-n, n-to-1, m-to-n• replacement, suggestion
43
Experiment Setting (2)
Context– Real systems for which source code history is available– Access to the experts of the systems
Case studiesClasses KLOC Contr. Revisions Deltas Experts
Pharo2 3,246 374 37 435 3,196 2
Pharo3 3,844 412 37 483 4,729 4
Moose 2,617 210 21 512 2,848 4
Glamour 1,117 81 24 598 1,807 1
Roassal 411 29 6 379 1,559 1
Seaside 1,122 97 18 862 3,962 -
44
Experiment Results (RQ1)
RQ1 Are the generated rules valid to experts?
Rules Valid Invalid Don’t know Precision
Pharo2 62 17 20 25 46%
Pharo3 95 68 24 3 74%
Moose 50 43 7 0 86%
Glamour 23 15 8 0 65%
Roassal 36 28 8 0 78%
Seaside 76 61 15 - 80%
45
Experiment Results (RQ2)
RQ2 Which types of rules are generated?
Rules1-to-1 1-to-m, m-to-1, m-to-n
replacement suggestion replacement suggestion
Pharo2 42 16 (38%) 17 (40%) 8 (19%) 1 (2%)
Pharo3 71 26 (37%) 23 (32%) 19 (27%) 3 (4%)
Moose 61 15 (25%) 41 (67%) 2 (3%) 3 (5%)
Glamour 43 14 (33%) 27 (63%) 2 (5%) 0 (0%)
Roassal 15 7 (47%) 7 (47%) 1 (7%) 0 (0%)
Seaside 28 2 (7%) 23 (82%) 3 (11%) 0 (0%)
Total 260 80 (31%) 138 (53%) 35 (13%) 7 (3%)
46
Examples
1. isNil().ifTrue(*) → ifNil(*)
2. Scanner.new().scanTokens(*) → parseAsLiteralToken()
3. intersect(*) → intersectIfNone(*,*)
4. Character.cr() → ROPlatform.current().newLine()
47
Discovering API evolution rules:Summary
RQ1 Precision remained between 46% and 86%
RQ2 Rules 1-to-1, 1-to-n, n-to-1 and m-to-n involving method replacement and suggestion
Also validated in C# case studies (Siemens)
Single systems analysisNext, software ecosystem analysis
48
Outline
1. Context and Motivation2. Problems3. Benefits of System-specific Rules4. Supporting System-specific Conventions with
Automatic Extraction5. Supporting Client Systems with Automatic
Extraction6. Impact of Software Evolution on Ecosystems7. Conclusion
49
Impact of Software Evolution on Ecosystems (1)
The analysis of software evolution on ecosystems help developers:– To better understand the real extension of the impact– To understand how the impact can be alleviated
50
Impact of Software Evolution on Ecosystems (2)
Researchers should also support developers working in software ecosystems
A first large-scale study [RLR12]
– API deprecation– Naturally, API evolution is not restricted to deprecation
51
The Pharo Ecosystem
Choice of Pharo– Relevant: six years of evolution (from 2008 to 2013) with
3,588 systems and 2,874 contributors– Clear inclusion criterion– Comparison with API deprecation [RLR12]
52
Experiment Setting
Frameworks and libraries: Pharo language (118 API rules)
Research questions:– Magnitude– Extension– Consistency
Version1.0 1.4 2.0 3.0
Classes 3,378 3,038 3,345 4,268KLOC 447 358 408 483
53
Experiment Results: Magnitude
RQ1 How many systems react to the API changes in the ecosystem and how many developers are involved?
– 178 systems, 252 packages, 503 classes, 796 methods, 134 developers– API changes have a large impact on the ecosystem– Localized per system whereas it is spread over several systems
isNil().ifTrue() → ifNil()
54
Experiment Results: Extension
RQ2 Do all the systems in the ecosystem react to the API changes?
– 2,188 systems, 107,549 methods, 1,579 developers– The number of affected systems are much higher than those that react– Systems may be dead (< 10 commits)
55
Experiment Results: Consistency
RQ3 Do the systems react to an API change in the same way?– Alternative rules can be found in the ecosystem– Rules can be more confidently extracted from frameworks
56
Ecosystem impact analysis:Summary
RQ1 API changes have a large impact on the ecosystem
RQ2 The number of affected systems are much higher than those that react
RQ3 Rules can be more confidently extracted from frameworks and libraries
57
Outline
1. Context and Motivation2. Problems3. Benefits of System-specific Rules4. Supporting System-specific Conventions with
Automatic Extraction5. Supporting Client Systems with Automatic
Extraction6. Impact of Software Evolution on Ecosystems7. Conclusion
58
Conclusion (1)
Software evolution is naturally a complex task
The impact of software evolution may be large and sometimes unknown
We need to ensure that changes are consistently propagated to dependent systems
59
Conclusion (2)
We reviewed several approaches to support software evolution
We argued for the need to analyze and improve rules as to better support developers keeping track of software evolution
We covered three aspects:– Benefits of system-specific rules– Improvement of automatically created rules– The impact of source code changes in a software ecosystem
60
Conclusion (3)
1. Assessing the relationship between system-specific violations and bugs– André Hora, Nicolas Anquetil, Stéphane Ducasse, Simon Allier. Domain Specific
Warnings: Are They Any Better? International Conference on Software Maintenance (ICSM), 2012.
2. Extracting system-specific conventions from software repositories– André Hora, Nicolas Anquetil, Anne Etien, Stéphane Ducasse, Marco Túlio
Valente. Mining System-specific Rules to Improve Static Analysis. Software Quality Journal, 2014. (major review)
– André Hora, Nicolas Anquetil, Stéphane Ducasse, Marco Túlio Valente. Mining System Specific Rules from Change Patterns. Working Conference on Reverse Engineering (WCRE), 2013.
61
Conclusion (4)
3. Extracting API evolution rules to support clients– André Hora, Anne Etien, Nicolas Anquetil, Stéphane Ducasse, Marco Túlio
Valente. APIEvolutionMiner: Keeping API Evolution under Control. Software Evolution Week (CSMR-WCRE), 2014.
– André Hora, Nicolas Anquetil, Anne Etien, Stéphane Ducasse, Marco Túlio Valente. Mining API Change Rules. (work in progress)
4. Analyzing the impact of software evolution on large-scale ecosystem– André Hora, Romain Robbes, Nicolas Anquetil, Anne Etien, Stéphane Ducasse.
How do Developers React to API Evolution? The Case of the Pharo Ecosystem. Empirical Software Engineering Journal, 2014. (submitted)
62
Future Work
Extracting other types of rules– Simply-deleted calls, method pairs, method call pairs
Better characterization of API evolution– Commits/releases, frameworks/clients
Ecosystem analysis– We focused on dynamically typed ecosystem– Lack: statically typed ecosystems– Dynamic vs. static
63
Collaboration (1)ASERG/UFMG group, Belo Horizonte, Brazil (2 months)
1. Cristiano Maffort, Marco Túlio Valente, Ricardo Terra, Mariza Bigonha, Nicolas Anquetil, André Hora. Mining Architectural Violations from Version History. Empirical Software Engineering Journal, 2014.
2. Cesar Couto, Marco Túlio Valente, Pedro Pires, André Hora, Nicolas Anquetil, Roberto S. Bigonha. BugMaps-Granger: a tool for visualizing and predicting bugs using Granger causality tests. Journal of Software Engineering Research and Development, 2014.
3. Cesar Couto, Pedro Pires, Marco Túlio Valente, Roberto S. Bigonha, André Hora, Nicolas Anquetil. BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs. Congresso Brasileiro de Software: Teoria e Prática, 2013.
4. Cristiano Maffort, Marco Túlio Valente, Mariza Bigonha, Nicolas Anquetil, André Hora. Heuristics for Discovering Architectural Violations. Working Conference on Reverse Engineering (WCRE), 2013.
5. Cristiano Maffort, Marco Túlio Valente, Mariza Bigonha, André Hora, Nicolas Anquetil, Jonata Menezes. Mining Architectural Patterns Using Association Rules. International Conference on Software Engineering and Knowledge Engineering (SEKE), 2013.
6. André Hora, Nicolas Anquetil, Stéphane Ducasse, Muhammad Bhatti, Cesar Couto, Marco Túlio Valente, Julio Martins. BugMaps: A Tool for the Visual Exploration and Analysis of Bugs. European Conference on Software Maintenance and Reengineering (CSMR), 2012.
7. Simon Allier, Nicolas Anquetil, André Hora, Stéphane Ducasse. A Framework to Compare Alert Ranking Algorithms. Working Conference on Reverse Engineering (WCRE), 2012.
64
Collaboration (2)
Siemens Research & Technology Center, Erlangen, Germany (2 months)
– API evolution techniques– Case studies: C#– New source of extraction: attributes, super-classes,
methods (not only invocations)– Migration support (automatic if possible)– Automatic API evolution documentation– Evolution of other software artifacts (e.g., XML)
65
Assessing and Improving Rules to Support Software Evolution
1. Deep understanding of the benefits provided by system-specific rules– We investigate whether such rules are worthwhile enforcing [HADA12]
2. Improvement of automatically created rules to support evolving systems and their clients– We propose two solutions to extract better rules from software
repositories [HADV13,HEA+14,HAE+14b,HAE+14a]
3. Analysis of the impact of software evolution on clients– We report on an investigation on the awareness of an ecosystem
about software evolution [HRA+14]