an empirical study of function clones in open source software
DESCRIPTION
This a presentation on a Research paper basically they made a tool call NICAD.TRANSCRIPT
![Page 1: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/1.jpg)
An Empirical Study of Function Clones in Open Source
SoftwareChnchal K.Roy and James R. Cordy
Queen’s University
Presenter: MF Khan
![Page 2: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/2.jpg)
Outline
• Introduction• NICAD Overview• Experimental Setup• Experimental Results• Conclusions• Discussion
2
![Page 3: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/3.jpg)
Introduction• Code Clone/Clone
– Reusing a code of fragment by copying and pasting with or without minor modifications
• Benefits– Software Maintenance (Bug detection)
• History– Several techniques were proposed– Lack of in depth comparative studies on cloning in
Variety of systems
3
![Page 4: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/4.jpg)
Introduction (Cont)• NICAD
– In depth study of function cloning in 15+ C and Java Systems including Apache and Linux kernel
– Accurate Detection of Near-Miss functions Clones.– Focusing on its worth in detecting copy/Pasted near-miss
clones by using pretty printing, Code normalization and filtering
– Light Weight using simple text line– Capable of detecting clones in very large system in different
languages
4
![Page 5: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/5.jpg)
NICAD Overview• Three phases of clone detection
– ExtractionAll potential clones are identified and extracted.All function and method in C & Java with their
original source coordinates– Comparison (Determination of Clones)
Potential clones are clustered and compared.Pretty printed potential clones line by line text wise using
Longest common subsequence(LCS).
5
![Page 6: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/6.jpg)
NICAD OverviewUnique Percentage of Items(UPI)
IF UPI for both line sequence is zero or below certain threshold.
– Potential Clones are consider to be clone
– Reporting Results from NICAD reported in XML database form and interactive HTML
6
![Page 7: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/7.jpg)
Experimental Setup
Paper applied NICAD to find function clones in a number of open source systems
Later on paper introduce a set of metrics to analyze the results
7
![Page 8: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/8.jpg)
Experimental SetupSubject Systems 10 C and 7 Java systems
8
![Page 9: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/9.jpg)
Clone Definition
• Non empty functions of at least 3 LOC• In Pretty printed format.• Different Unique Percentage of Items (UPI)
use to find exact and near miss clones.• E.g.
– If UPI threshold is 0.0 =Exact clone– If UPI threshold is 0.10=Two function as clone
9
![Page 10: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/10.jpg)
Validation of Clones
• To validate detected clone is 2 step process• 1:NICADE’s INTRACTIVE HTML OUTPUT
– To given an overall view of original source of clone classes an over view of original source of clone classes.
• 2:XML OUTPUT– To pair wise compare the original source of the
functions in each clone class– using Linux diff to determine the textual similarity
of the original source10
![Page 11: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/11.jpg)
Metrics and Visualizations
• Total Cloned Methods(TCM)– How to get over all cloning statistics
• File Associated with Clone(FAWC)– Overall localization of clones.– From a s/w maintenance point of view, a lower value of
FAWCP is desirable...Why?– If clone are localized to certain specific files and thus may
be easier to maintain– Still one can’t say which files contain the majority of clone
in the system11
![Page 12: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/12.jpg)
Metrics and Visualizations
• Cloned Ratio of File for Methods(CRFM)– With CRFM we attempt discover highly cloned files– In a particular file (f)
• Profile of Cloning Locality w.r.t Methods(PCLM)– Kapser and Godfrey provide 3 location base
function clones.– 1:In the same File 2:Same DIR 3: Different DIR
12
![Page 13: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/13.jpg)
Experimental Results
13
1.More function cloning in Open Source java than in C. On AvG about 15%(7.2% wrt LOC)
2.Effect of increasing UPI is almost identical.
![Page 14: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/14.jpg)
Detail Overview
14
1.Several of C system have <10% cloning function.
Java systems are consistent in cloning
![Page 15: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/15.jpg)
Clone Associated Files
15
![Page 16: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/16.jpg)
Clone Associated Files
• FAWC address the issue of what portion of the files in a system is associated with clone.
• A system with more clones but with associated with only a few files is in some sense better than a system with fewer clones scattered over many files from a software maintenance point of view.
16
![Page 17: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/17.jpg)
Profiles of Cloning Density• It tell us which files are highly cloned or which
files contain the majority of clones
17
That’s mean Scattered File and more near miss clones
![Page 18: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/18.jpg)
Profile of cloning Density
18
Assuming that cloned method in high density cloned file have been intentionally copy/Pasted.
![Page 19: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/19.jpg)
Profile Cloning Localization
19
Location of a clone pair is a factor in s/w maintenanceExcept Linux there are no exact clone in (UPI threshold 0.0) in C
When UPI threshold is 0.3,On average 45.9 %(49.0 % LOC) of clone pair in C Occur.
![Page 20: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/20.jpg)
Conclusion
• NICAD is capable of accurately finding the1.Exact Function Clone2.Near Miss Function Clones
20
![Page 21: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/21.jpg)
Discussion
21
• What is definition of Clone?• What is definition of near-miss clones?• Why Wel tab is higher in slide 14?• What if we use C++ or C#?• What will happen if we use smaller clone
granularity such as begin- end block
![Page 22: An Empirical Study Of Function Clones In Open Source Software](https://reader034.vdocument.in/reader034/viewer/2022042714/556ee96dd8b42ad36d8b468e/html5/thumbnails/22.jpg)
Thank you.
22