cohesion and coupling optimisationcrest.cs.ucl.ac.uk/cow/49/slides/cow49_paixao.pdf · semantic and...
TRANSCRIPT
Cohesion and Coupling OptimisationBALANCING IMPROVEMENT AND DISRUPTION
MATHEUS PAIXAO, MARK HARMAN, YUANYUAN ZHANG, YIJUN YU
What I have (not) done
The ultimate solution for software modularisation
Full understanding of developers behaviour
Identification of the best metrics for software modularisation
What I have done
Thorough empirical study of structural cohesion and coupling optimisation
Identification of disruption as an important overlooked issue
Multi-objective approach to balance structural improvement and disruption
Structural Dependencies
p1
c1 c2 c3
c4 c5
p2
c6 c7
c8
p3
c9
Function Call
Data Access
Inheritance
Interface Implementation
Modularisation drivers
Semantic and Structural cohesion/coupling metrics better describe developers’ implementations [1][2]
[2] Candela, I., Bavota, G., Russo, B., & Oliveto, R. (2016). Using Cohesion and Coupling for Software Remodularization : Is It Enough ? ACM Transactions on Software Engineering and Methodology, 25(3), 1–28.
[1] Bavota, G., Dit, B., Oliveto, R., Di Penta, M., Poshyvanyk, D., & De Lucia, A. (2013). An empirical study on the developers’ perception of software coupling. In 2013 35th International Conference on Software Engineering (ICSE) (pp. 692–701). San Francisco: IEEE.
Literature Review
Metrics Validation
LongitudinalEvaluation
DisruptionAnalysis
largest empirical study of automated software
re-modularisation to date
Software Systems Under Study
Modularisation Quality - MQ
p1
c1 c2 c3
c4 c5
p2
c6 c7
c8
p3
c9
𝑀𝐹(𝑝1) = 0.72
𝑀𝐹(𝑝2) = 0.66
𝑀𝐹(𝑝3) = 0.00
𝑀𝑄 = 1.38
RQ1: Is there any evidence that open source software systems respect structural measurements of cohesion and coupling?
RQ1.1: Purely Random Distribution RQ1.2: k-Random Neighbourhood Search
RQ1.3: Systematic 1-Neighbourhood Search
Cohesion
99.99%
0.1001%
MQ
99.99%
0.0866%
RQ2: What is the relationship between raw cohesion and the MQ metric?
-100.00%
-50.00%
0.00%
50.00%
100.00%
150.00%
200.00%
250.00%
300.00%
350.00%
400.00%
Bunch Solutions
MQ Difference Cohesion Difference
Bunch Search Solutions
Pivot 2.0.2
13
Packages
58
All Systems
493.11%
RQ2: What is the relationship between raw cohesion and the MQ metric?
Package-constrained Search Solutions
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
Package-constrained Solutions
MQ Difference Cohesion Difference
DisMoJo
Disruption metric based on MoJoFM[2]
[3] Zhihua Wen, & Tzerpos, V. (2004). An effectiveness measure for software clustering algorithms. In Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004. (pp. 194–203).
Disruption Analysis Workflow
Releases
Re-modularisationapproach
ImprovedModularisations
DisMoJo
DisruptionValues
BunchPackage-constrained
30 executions
RQ3: What is the disruption caused by search based approaches for optimising software modularisation?
Bunch Package-constrained
Mean
80.39% 57.82%
Release 1.2
p4
Longitudinal Disruption Analysis
Release 1.1
p1
c1
p2
p3
c2 c3
c4 c5
c9
c6 c7
c8
p1
c1
p2
p3
c2 c3
c4 c5
c9
c6 c7
c8
c10 c11 c12
Balancing modularity improvement and disruption
RQ5: What is the modularity improvement provided by the multiobjective search for acceptable disruption levels?
Package-free
Lower Bound
1.66% in MQ
0.13% in Cohesion
Upper Bound
150.38% in MQ
2.38% in Cohesion
Package-constrained
Lower Bound
3.36% in MQ
0.72% in Cohesion
Upper Bound
59.25% in MQ
23.49% in Cohesion
Conclusion
Software systems respect modularity measurements
Search based approaches for re-modularisation cause large disruption
Multiobjective search can be used to find clear and constant trade-off between modularity improvement and disruption
Modularity can be improved within lower and upper bounds of acceptable disruption performed by developers
Backup – System’s Selection Criteria
At least 10 subsequent official releases
No general libraries and APIs
Java systems
Backup - Software Systems Under Study
it’s an ordinal metric[1]
value has no meaning
it’s not normalised
Backup - Modularisation Metrics
avoids god packages
MQ
[4] Stanley Stevens. (1946). On the Theory of Scales of Measurement. American Association for the Advancement of Science, 103(2684), 677–680.
it’s an interval metric[1]
easy to understand
it’s normalised
leads to god packages
Cohesion
inflation effect
Backup – RQ1 table of results
Backup – RQ2 table of results
Backup – Classes distribution
RQ2.3: Package-constrained Search Solutions
30.06% cohesion improvement
34.15%
Biggest package
Smallest package
Original Implementations
0.32%
Biggest package
Smallest package
Package-constrained solutions
1.33%
19.80%
Backup – RQ3 table of results
Backup - RQ3: Best MQ and best cohesion disruption
Bunch Package-constrained
Mean
80.39% 57.82%
Best MQ
79.67% 55.30%
Best Cohesion
78.77% 54.33%
Backup – RQ4 Two-Archive GA parameters
Population Size: number of classes (N)
Single point crossover (0.8 if N < 100; 1.0 otherwise)
Swap mutation (0.004 x log2N)
Tournament selection (size 2)
50N Generations
Backup – RQ5 natural disruption
Backup – RQ5 modularity improvement within acceptable disruption