1
Comparing the Decompositions Produced by Software Clustering
Algorithms using Similarity Measurements
2001 IEEE International Conference on Software Maintenance (ICSM'01).
Brian S. Mitchell & Spiros MancoridisMath & Computer Science, Drexel University
2Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Motivation
Using module dependencies when determining the similarity between two decompositionsis a good idea…
3Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Clustering the Structure of a System (1)Given the structure of a system…
4Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Clustering the Structure of a System (2)
The goal is to partition the system structure graphinto clusters…
The clusters shouldrepresent the
subsystems
5Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Clustering the Structure of a System (3)
But how do we know that the clustering result is good?
6Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Ways to Evaluate Software Clustering Results…
Assess it against a mental modelAssess it against a benchmark standardTechniques: Subjective Opinions Similarity Measurements
Given a software clustering result, we can:
7Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Example: How “Similar” are these Decompositions?
M1
M5
M2
M6
M3
M4
M7
M8
M1
M5
M2
M6
M3
M4
M7
M8
Green Edges:Similarity still the same…Red Edges:Not as similar…Conclusions:Once we add the red edges the similarity between PA and PB decreases
PA
PB
Blue Edges:Similarity still the same…
8Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Observations
Edges are important for determining the similarity between decompositionsExisting measurements don’t consider edges: Precision / Recall (similarity) MoJo (distance)
Our idea: Use the edges to determine similarity
9Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Research ObjectivesCreate new similarity measurements that use dependencies (edges) EdgeSim (similarity) MeCl (distance)
Evaluate the new similarity measurements against MoJo & Precision/RecallUse similarity measurements to support evaluation of software clustering results (see our WCRE’01 paper)
10Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Example: How “Similar” are these Decompositions?
M1
M5
M2
M6
M3
M4
M7
M8
M1
M5
M2
M6
M3
M4
M7
M8
Add Green Edges:PR, MoJo, MeCl & EdgeSim unchanged.
Add Red Edges:PR, MoJo unchanged. EdgeSim, MeCl reduced.
PA
PB
Add Blue Edges:PR, MoJo, MeCl & EdgeSim unchanged.
11Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Definitions
M1
M2
M3
M4
Internal/Intra-Edge: Edge within a cluster
External/Inter-Edge: Edge between two clusters
12Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
EdgeSim Example
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
a b
c f g
hd e
i j
k l
MDG
13Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
EdgeSim Example
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
a b
c f g
hd e
i j
k l
MDG
Step 1:Find Common
Inter- and Intra-Edges
14Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
EdgeSim Example
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
a b
c f g
hd e
i j
k l
MDG
CommonEdge Weight
Total EdgeWeight
=10
19= 53%
15Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
MeCl Example
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
a b
c f g
hd e
i j
k l
MDG
16Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
MeCl Example(AB)
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
A1 A3A2
B1 B2
17Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
MeCl Example(A1 B1)
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
a b
c
A1,1
A1 A3A2
B1 B2
U
18Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
MeCl Example(A2 B1)
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
a b
cf g
A1,1 A2,1
A1 A3A2
B1 B2
U
19Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
MeCl Example(A1 B2)
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
a b
cf g
d e
A1,1 A2,1
A1,2
A1 A3A2
B1 B2
U
20Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
MeCl Example(A2 B2)
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
a b
cf g
d e h
A1,1 A2,1
A1,2 A2,2
A1 A3A2
B1 B2
U
21Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
MeCl Example(A3 B2)
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
a b
cf g
d e h
k l
i j
A1,1 A2,1
A1,2 A2,2
A3,2
A1 A3A2
B1 B2
U
22Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
B1
B2
MeCl Example(AB)
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
a b
cf g
d e h
k l
i j
A1,1 A2,1
A1,2 A2,2
A3,2
A1 A3A2
B1 B2
23Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
MeCl Example (AB)
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
a b
cf g
d e h
k l
i j
A1,1 A2,1
A1,2 A2,2
A3,2
A1 A3A2
B1 B2
B1
B2
Newly Introduced Inter-Edges
24Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
MeCl Example(BA)
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
a b
cf g
d e h
k l
i j
B1,1 B1,2
B2,1 B2,2
B2,3
A1 A3A2
B1 B2
25Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
A1 A2
A3
MeCl Example(BA)
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
a b
cf g
d e h
k l
i j
B1,1 B1,2
B2,1 B2,2
B2,3
A1 A3A2
B1 B2
26Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
MeCl Example (BA)
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
a b
cf g
d e h
k l
i j
B1,1 B1,2
B2,1 B2,2
B2,3
A1 A3A2
B1 B2
A1 A2
A3
Newly Introduced Inter-Edges
27Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
MeCl Calculation
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
A1 A3A2
B1 B2
Inter-Edges Introduced
MeCl(AB):({b,e},{e,c},{g,h},{f,h})
MeCl(BA):({e,i},{h,j},{b,f},{c,f},{h,e})
MeCl= 1 -maxW(MAB, MBA)Total Edge Weight
MeCl = 1 -5
19 = 73.7%
28Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Similarity Measurement RecapM1
M5
M2
M6
M3
M4
M7
M8
M1
M5
M2
M6
M3
M4
M7
M8
A1:
B1:
M1
M5
M2
M6
M3
M4
M7
M8
M1
M5
M2
M6
M3
M4
M7
M8
A2:
B2:
P1 P2
MoJo(P1) = MoJo(P2) = 87.5%
PR(P1) = PR(P2) = P:84.6%, R:68.7%, AVGPR=76.7%
Conclusion… P1 is equally similar to P2
29Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Similarity Measurement RecapM1
M5
M2
M6
M3
M4
M7
M8
M1
M5
M2
M6
M3
M4
M7
M8
A1:
B1:
M1
M5
M2
M6
M3
M4
M7
M8
M1
M5
M2
M6
M3
M4
M7
M8
A2:
B2:
P1 P2
EdgeSim(P1)=77.8% EdgeSim(P2)=58.3%
MeCl(P1)=88.9% MeCl(P2)=66.7%
Conclusion… P1 is more similar than P2
30Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Summary:EdgeSim & MeCl
EdgeSim: Rewards clustering algorithms for
preserving the edge types Penalizes clustering algorithms for
changing the edge types
MeCl: Rewards the clustering algorithm for
creating cohesive “subclusters”
31Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Special Modules
i j
k la b
c
d e
f g
h
c
f
g
a
b
e
h
k
l
d
i
j
PA
PB
A1 A3A2
B1 B2
Omnipresent Modules:“Strong” Connection to other ModulesLibrary Modules:Always used by other modules, never use other modules
Isomorphic Modules:Modules equally connected to other subsystems
32Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Special Modules
i
k la b
c f g
h
c
f
g
a
b
h
k
l
i
PA
PB
A1 A3A2
B1 B2
k
k
Omnipresent Modules:RemovedLibrary Modules:Removed
Isomorphic Modules:Replicated
Special Treatment of Special Treatment of Special Modules helps to Special Modules helps to determine the Similaritydetermine the Similarity
d
d
33Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Case Study Overview
Clustering Algorithms
Similarity Evaluation Tool
Similarity Analysis
Precision/Recall
Source Codevoid main(){ printf(“hello”);}
Clustered Result
M1
M2
M3
M5M4
M6
M7 M8
MoJo
EdgeSim
MeClAverage, Variance, etc. based on 100 clustering runs…(4950 Evaluations)
34Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Case Study ObservationsAll similarity measurements exhibit consistent behavior for the systems studied
Removal of “special” modules improved all similarity measurements Treating isomorphic modules specially only improved similarity slightlyEdgeSim and MeCl produced higher and less variable similarity values then Precision/Recall and MoJo
For all systems examined:If MeCl(SA) < MeCl(SB) then MoJo(SA) < MoJo(SB), PR(SA) < PR(SB), and EdgeSim(SA) < EdgeSim(SB)
35Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu
Questions
Special Thanks To: AT&T Research Sun Microsystems DARPA NSF US Army