empirical results on cloning and clone detection

CloningClone Detectionw

ww

.uni

-stu

ttgar

t.de

Empirical Results

Stefan Wagner @prof_wagnerst

Alpen-Adria-Universität Klagenfurt1. Juni 2015

on

and

http://www.apple.com/de/

You can

copy, share and change,

film and photograph,

blog, live-blog and tweet

this presentation given that you attributeit to its author and respect the rights andlicences of its parts.

basiert auf Vorlagen von @SMEasterbrook und @ethanwhite

Technische Universität München

Class A Class B

Often 20%–30% redundancy

We need to detect and remove clones reliably and automatically.

Types of Clones

Type 1 an exact copy without modifications (except for whitespace and comments)

Type 2 a syntactically identical copy; only variable, type, or function identifiers have been changed

Type 3 a copy with further modifications; statements have been changed, added, or removed

Clone detection: Processing steps

Storage

load

tokenise &

normalise

find duplicates

extract clones

visualise

• Number of clone groups/clone instances• Size of largest clone/cardinality of most

frequent clone

• Cloned StatementsNumber of statements in the system being part of at least one clone

• Clone Coverage– #Cloned Statements / #Statements– Probability of a randomly chosen statement to be part of

a clone

Measures for cloning

Compare View (~20 LOC)

Seesoft View (~400 LOC)

Tree Maps (>1.000.000 LOC)

Trends over Time

Visualisation of clone detection results


1 Code Clones

Inconsistencies

Can you spot the difference?

How problematic are these inconsistencies (and clones)?

Indicating harmfulness[Lague97]: inconsistent evolution of

clones in industrial telecom. SW.

[Monden02]: higher revision number for files with clones in legacy SW.

[Kim05]: substantial amount of coupled changes to code clones.

[Li06], [SuChiu07] and [Aversano07], [Bakota07]: discovery of bugs through search for inconsistent clones or clone evolution analysis.

Doubting harmfulness[Krinke07]: inconsistent clones hardly

ever become consistent later.

[Geiger06]: Failure to statistically verify impact of clones on change couplings

[Lozano08]: Failure to statistically verify impact of clones on changeability.

[Göde11]: Most changes intentionally inconsistent

[Rahman12]: no statistically significant impacts on faults

Our First Study at ICSE 2009

• Manual inspection of inconsistent clones by system developers

No indirect measures of consequences of cloning

• Both industrial and open source software analysed

• Quantitative data

Deissenboeck, Juergens, Hummel, Wagner, ICSE, 2009

Research QuestionsRQ1: Are clones changed inconsistently?

|IC| / |C|

RQ2: Are inconsistent clones created unintentionally?

|UIC| / |IC|

RQ3: Can inconsistent clones be indicators for faults in real systems?

|F| / |IC|, |F| / |UIC|

Clone Groups C (exact and incons.)

Inconsistent clone groups IC

Unintentionally incons. CloneGroups UIC

Faulty clone Groups F

Study Design

Tool detected clone group candidates CC

Clone group candidate detection

• Novel algorithm

• Tailored to target program

False positive removal

• Manual inspection of all inconsistent

and ¼ exact CCs

• Performed by researchers

Assessment of inconsistencies

• All inconsistent clone groups inspected

• Performed by developers

Clone groups C (exact and incons.)

Inconsistent clone groups IC

Unintentionally inconsistent clonegroups UIC

Faulty clone groups F

→ CC

→ C, IC

→ UIC, F

Study Objects

International reinsurance company, 37.000 employees

Munich-based life-insurance company, 400 employees

Sysiphus: Open source collaboration environment for distributed SW development. Developed at TUM.

2818JavaTUMSysiphus

19717CobolLV 1871D

4952C#Munich ReC

4544C#Munich ReB

3176C#Munich ReA

Size (kLoC)Age (years)LanguageOrganizationSystem

Results

Project A B C D Sys. Sum

Clone groups |C| 286 160 326 352 303 1427

Inconsistent CGs |IC|

159 89 179 151 146 724

Unint. Incos. |UIC|

51 29 66 15 42 203

Faulty CGs |F| 19 18 42 5 23 107

Threats to Validity

• Analysis of latest version instead of evolution.

• Developer review error

• Clone Detector Configuration

• System selection not random

(impact on transferability)

• All inconsistencies of interest, independent of creation time.

• Conservative strategy only

makes positive answers harder

• Validated during pre-study

• 5 different dev. organisations

• 3 different languages

• Technically different

Con

stru

ctIn

tern

alEx

tern

al

Threat Mitigation

Our Second Study

• Investigating evolution of type-3 clones• Relationship with documented faults from

issue tracker• Industrial systems

Research QuestionsRQ1: Do software systems contain type-3

clones?

|CT3| / |C|

RQ2: Do type-3 clones contain documented faults?

|CT3F| / |CT3|

RQ3: Are developers aware of type-3 clones?

|IMS| / |IM|, |Cx| / |CT3F|, |CT2

F ▶︎ CT3NF|

Clone Groups C (exact and incons.)

Inconsistent clone groups CT3

Faulty clone Groups CT3F

Data Collection and Analysis

Tool Support

Quality Model EditorHTMLDash-board

Code, documentation,inspection- and test results

v1 v2 v3

Extract

Analyse

Query for relationshipsand evolution

Extract

Study Objects

The resuls of the SQL query is required for the analysis of theinconsistent clones for faults.

After the executing all the SQL-queries, now all the dataready for analysis. To determine the faulty code in one version,the version of a file in which an error was found recorded. Asthe repository of a project is at an older version, the revisionhistory with the pull function in Mercurial add new changesthat are each incorporated by a commit message in the revisionhistory and thus obtain a ChangesetID. In Tortoise is possiblefor each file to see the revision history. Thus, for each file ofthe inconsistent clone class considers the entire revision historyand the development are checked. Furthermore, it can bedetermined whether an error was corrected during developmentin the inconsistent clone files. Therefore, the conclusions aboutthe inaccuracy of the inconsistent Klonklassen can be madefrom these retrieved results.

H. Validity Procedure

1) Construct validity: The development history of the threesystems was analyzed to determine whether the inconsistenclones introduce by changes to a system. The problem is thatthe code fragments were inserted by copying and modifiyingin a single commit. Therefore, the entire revision histroyof industrial systems has been manually processed to checkall changes of a code fragment. Another threat to constructvalidity is the clone cases which have a bug in the issue-tracking system were only used for each system as a basisfor faulty code fragments.

Other treats to construct should be added here

2) Internal validity:

3) External Validity:

IV. RESULTS (ASIM, STEFAN)

A. Case Description

TABLE I. SUMMARY OF THE STUDY OBJECTS

Size AgeSystem Domain Lang. (KLOC) Revision (Years) Developers

A Automotive Java 253 2470 4 10B Automotive Java 332 1622 5 5C Automotive Java 454 2181 4 10

B. Share of Type-3 Clones (RQ 1)

Table III contains the quantitative results for all researchquestions in detail. We found a mean share of type-3 clonesin all clones and all three systems of 52 %. Yet, it variedquite strongly from 23 % in system B to 79 % in systemC. Nevertheless, in all three systems, there is a considerableshare of type-3 clones and, hence, it is useful to investigatetheir relationship with faults.

Should we include the liberal/conservative detectionapproach stuff here?

Answer to RQ 1: On average, every second clone class isa type-3 clone class. Therefore, they are a substantial partof all clones.

TABLE IV. SUMMARY OF RESULTS

Project A B C Total

Clone classes |C| 37 88 82 207Type-3 clone classes |CT3 | 21 21 65 107RQ 1: |CT3 |/|C| 0.56 0.23 0.79 0.52

Faulty clone classes |CF | 16 5 37 58Faulty type-3 clone classes |CT3

F | 7 1 2 10RQ 2: |CT3

F |/|CT3 | 0.33 0.05 0.03 0.17

Type-3 clones |I| 46 43 146 235Modified type-3 clones |IM | 24 19 67 110Simultaneously modified type-3 clones |IMS | 14 17 62 93RQ 3.1: |IMS |/|IM | 0.58 0.89 0.92 0.85

Fixed type-3 clone classes |CX | 4 1 0 5RQ 3.2: |CX |/|CT3

F | 0.57 1.00 0 0.5

Faulty type-2 clone classes |CT2F | 9 4 35 48

Non-faulty type-3 clone classes |CT3NF | 14 20 63 97

|CT2F ! CT3

NF | 9 4 35 48RQ 3.3: |CT2

F ! CT3NF |/|C

T2F | 1 1 1 1

Mean length of type-3 clones 60 62 78Mean length of faulty type-3 clones 50 39 83RQ 4: ? ? ? ?

C. Type-3 Clones with Documented Faults (RQ 2)

We found documented faults in 58 clone classes overall.Ten of these were type-3 clone classes. Interestingly, whilesystem C had the most faulty clone classes (37), system A hadthe most faulty type-3 clone classes (16). Hence, in our smallsample the relationship from faulty clone classes to faulty type-3 clone classes is not linear. This could be an indication thatsome other factors, such as the developers awareness of clones,play a role.

The ratio of faulty type-3 clone classes in relation to alltype-3 clone classes is on average 17 %. Because of thediscussed imbalance between faulty clone classes and faultytype-3 clone classes, this ratio varies strongly from 3 % insystem C to 33 % in system A. Again, this could be anindication that the developers of systems B and C were moreaware of the clones and, hence, introduced less inconsistencieswhich represent faults.

Potentially, the ratio of faulty type-3 clone classes couldbe higher, because we only analysed documented faults. Therestill might be several faults in the code not detected so far (asobserved by Juergens et al. [2]). Yet, we wanted to concentrateon faults that had an actual effect and led to failures.

Answer to RQ 2: On average, 17 % of all type-3clone classes contained a documented fault. The range isfrom 3 % to 33 %. Therefore, type-3 clones do containdocumented faults but not a high ratio of them. Developerawareness of the clones might play a role in the varianceof the results.

D. Developers’ Awareness of Type-3 Clones (RQ 3)

In RQ 1, we established that type-3 clones are interestingto investigate. In RQ 2, we found that type-3 clones containdocumented faults as well as an indication that developerawareness of these clones plays a role. Therefore, we in-vestigate this awareness more closely. We analysed threedifferent indications of this awareness which we discuss inthe following.

ResultsThe resuls of the SQL query is required for the analysis of theinconsistent clones for faults.

After the executing all the SQL-queries, now all the dataready for analysis. To determine the faulty code in one version,the version of a file in which an error was found recorded. Asthe repository of a project is at an older version, the revisionhistory with the pull function in Mercurial add new changesthat are each incorporated by a commit message in the revisionhistory and thus obtain a ChangesetID. In Tortoise is possiblefor each file to see the revision history. Thus, for each file ofthe inconsistent clone class considers the entire revision historyand the development are checked. Furthermore, it can bedetermined whether an error was corrected during developmentin the inconsistent clone files. Therefore, the conclusions aboutthe inaccuracy of the inconsistent Klonklassen can be madefrom these retrieved results.

H. Validity Procedure

1) Construct validity: The development history of the threesystems was analyzed to determine whether the inconsistenclones introduce by changes to a system. The problem is thatthe code fragments were inserted by copying and modifiyingin a single commit. Therefore, the entire revision histroyof industrial systems has been manually processed to checkall changes of a code fragment. Another threat to constructvalidity is the clone cases which have a bug in the issue-tracking system were only used for each system as a basisfor faulty code fragments.

Other treats to construct should be added here

2) Internal validity:

3) External Validity:

IV. RESULTS (ASIM, STEFAN)

A. Case Description

TABLE I. SUMMARY OF THE STUDY OBJECTS

Size AgeSystem Domain Lang. (KLOC) Revision (Years) Developers

A Automotive Java 253 2470 4 10B Automotive Java 332 1622 5 5C Automotive Java 454 2181 4 10

B. Share of Type-3 Clones (RQ 1)

Table III contains the quantitative results for all researchquestions in detail. We found a mean share of type-3 clonesin all clones and all three systems of 52 %. Yet, it variedquite strongly from 23 % in system B to 79 % in systemC. Nevertheless, in all three systems, there is a considerableshare of type-3 clones and, hence, it is useful to investigatetheir relationship with faults.

Should we include the liberal/conservative detectionapproach stuff here?

Answer to RQ 1: On average, every second clone class isa type-3 clone class. Therefore, they are a substantial partof all clones.

TABLE IV. SUMMARY OF RESULTS

Project A B C Total

Clone classes |C| 37 88 82 207Type-3 clone classes |CT3 | 21 21 65 107RQ 1: |CT3 |/|C| 0.56 0.23 0.79 0.52

Faulty clone classes |CF | 16 5 37 58Faulty type-3 clone classes |CT3

F | 7 1 2 10RQ 2: |CT3

F |/|CT3 | 0.33 0.05 0.03 0.17

Type-3 clones |I| 46 43 146 235Modified type-3 clones |IM | 24 19 67 110Simultaneously modified type-3 clones |IMS | 14 17 62 93RQ 3.1: |IMS |/|IM | 0.58 0.89 0.92 0.85

Fixed type-3 clone classes |CX | 4 1 0 5RQ 3.2: |CX |/|CT3

F | 0.57 1.00 0 0.5

Faulty type-2 clone classes |CT2F | 9 4 35 48

Non-faulty type-3 clone classes |CT3NF | 14 20 63 97

|CT2F ! CT3

NF | 9 4 35 48RQ 3.3: |CT2

F ! CT3NF |/|C

T2F | 1 1 1 1

Mean length of type-3 clones 60 62 78Mean length of faulty type-3 clones 50 39 83RQ 4: ? ? ? ?

C. Type-3 Clones with Documented Faults (RQ 2)

We found documented faults in 58 clone classes overall.Ten of these were type-3 clone classes. Interestingly, whilesystem C had the most faulty clone classes (37), system A hadthe most faulty type-3 clone classes (16). Hence, in our smallsample the relationship from faulty clone classes to faulty type-3 clone classes is not linear. This could be an indication thatsome other factors, such as the developers awareness of clones,play a role.

The ratio of faulty type-3 clone classes in relation to alltype-3 clone classes is on average 17 %. Because of thediscussed imbalance between faulty clone classes and faultytype-3 clone classes, this ratio varies strongly from 3 % insystem C to 33 % in system A. Again, this could be anindication that the developers of systems B and C were moreaware of the clones and, hence, introduced less inconsistencieswhich represent faults.

Potentially, the ratio of faulty type-3 clone classes couldbe higher, because we only analysed documented faults. Therestill might be several faults in the code not detected so far (asobserved by Juergens et al. [2]). Yet, we wanted to concentrateon faults that had an actual effect and led to failures.

Answer to RQ 2: On average, 17 % of all type-3clone classes contained a documented fault. The range isfrom 3 % to 33 %. Therefore, type-3 clones do containdocumented faults but not a high ratio of them. Developerawareness of the clones might play a role in the varianceof the results.

D. Developers’ Awareness of Type-3 Clones (RQ 3)

In RQ 1, we established that type-3 clones are interestingto investigate. In RQ 2, we found that type-3 clones containdocumented faults as well as an indication that developerawareness of these clones plays a role. Therefore, we in-vestigate this awareness more closely. We analysed threedifferent indications of this awareness which we discuss inthe following.

Conclusions

• About half of all clone classes are type-3 clones.

• Rate of faulty type-3 clones is about 17 %.• There is a high awareness of clones and

inconsistencies.• This awareness seems to impact how many

faults are related to type-3 clones.• Further studies should take this into account.• Making developers aware of clones seems still

to be worthwhile.


2 Model Clones

Why not analyse generated code?

_1 = In * I; _2 = In * P; _3 = In * D; _4 = _1 + I-Delay; _5 = _3 - D-Delay; Out = _4 + _2 +_5; I-Delay = _4; D-Delay = _3;

Clone Detection Pipeline

Simulink FilesSimulink Models

NormalisationD

etec

tion

Simulink Parser

Flat Labeled Graph

Clone Pairs Clone Groups

Clustering Visualisation

Data Flow Clones▪ Basic Criteria:

– Functionally independent (Abstracted elements)– Reusable (Connected)– Functionally complex (Size of functional elements)– General (Number of instances)

▪ Additional Criteria: Relevance– Application domain specific– Intellectual Property

▪ Typical Examples– Sensor Data Plausibilisation– Error Management

Deissenboeck, Wagner et al., ICSE'08

37 % of relevant blocks are part of at least one clone group.

Conclusions

• It is possible to formulate a useful understanding for clones in models.

• Needs different algorithms for graph-based models

• If models are used for code generation, they will contain clones similar as in source code.


3 Requirements

Clones

"Redundancy [in requirements specifications] causes good engineers to suffer and the resulting systems will probably suffer, too."

–Matthias Weber, Joachim Weisbrod

Modifiability generally requires a requirements specification to […] not be redundant.

–IEEE 830-1998

TermsRequirements specification“specification for a particular software product, program, or set of

programs that performs certain functions in a specific environment.” [IEEE 830-1998]

Clone• Duplicated specification text of at least 20 words

• Small differences (e.g., declination) are tolerated

• Must refer to specified system

• False positives: e.g., page footers with copyright information

Research questions

1.How much cloning do real-world requirements specifications contain?

2.What kind of information is cloned in requirements specifications?

3.What consequences does cloning in requirements specifications have?

4.Can cloning in requirements specifications be detected accurately using existing clone detectors?

Study designRandom assignment of specifications

Detection tool execution

Inspection of detected clones Adding of filters

False positives?

Categorisation of clones

Independent re-categorisation Analysis of corresp. source code

Data analysis & interpretation

Yes

No

Regular expressionsRemoval of clonesImprovement in precisionCategorisation of the types of false positives

Adding of filters




False positives?




Yes

No

• Qualitative analysis: content analysis• Sample is categorised• Mix of theory-based and Grounded Theory• 4+8 categories• Documentation of additional information

(mostly inconsistencies between clones)





False positives?




Yes

No

2 ratersSample: 5 specificationsSample: 5 clone groupsAnalysis of inter rater agreement

Independent re-categorisation




False positives?




Yes

No

Study objects

28 specifications11 organisations8,667 pagesover 1.2 Mio. wordsEnglish & German

Domains:automotiveavionicsfinancetelecommunicationtransport

“The contracts with the clients describe the conditions regarding obligatory liabilities that the clients have agreed on with X. The liabilities are calculated from the exposures from Y and the contract conditions from X. The liability-relevant parts of the contracts thus need to be managed in system Z.”



Typical Clones• Entire use cases copied

• Similar combinations of pre and post conditions copied

• Descriptions of terms or roles copied

Example* 42 instances (61 words, 13 instances with > 100 words)

*Translated from German



…

1.How much cloning do real-world requirements specifications contain?

H F A G Y Z L C K U X AB V B D N AC I P W O S M J E R Q T

000,70,911,21,61,925,85,55,4

8,28,18,911,212,112,4

15,518,118,5

20,519,621,922,1

35

51,1

71,6

Clone coverage in percentage

Mean 13.6%

2.What kind of information is cloned?

Use case step

Reference

UI

Domain knowledge

Interface description

Precondition

Side condition

Configuration

Feature

Techn. domain knowlege

Postcondition

Rationale 1

3

3

5

6

7

10

13

14

15

15

24

Percentage of clones, more than one category possible

3.What consequences does cloning have?

AB H L A Y B V N U F AC D C Z G X K W M S I P O E R J Q T

0000,10,30,30,30,30,40,50,61,22,12,82,93,24,14,24,87

8,210,311,1

12,7

1717,518,5

36,7

Additional effort in hours per inspector

Mean 6

Modification• Multiple inconsistent specification clones identified

• Differences suspected to be unintentional⇒ Indication that inconsistent updates happen in practice

ImplementationTraced specification clone groups to implementation. 3 cases:

• Shared abstraction

• Cloned code

• Independent reimplementation of similar functionality⇒ Indication that spec. cloning causes redundancy in

implementation

4.Can cloning be detected accurately using existing clone detectors?

E F G J N S W Z Y X I V B L AC P C M R AB A O D H K U

85

969799100100100100100100100100100100100100100100100100100100100100100100

85969799227304044454848525859719697100100100100100100100100

Before tailoringAfter tailoring

Precision in percentage

Threats to validity

Internal• Pairs of researchers to reduce errors during manual steps

• Reading speeds for cloned vs non-cloned text? Assumed similar. Further research required

• Recall unclear. But: does not affect study results

External• Substantial differences between requirements specifications

(format, organisation, language, …)

But: large amount of study objects from different companies, domains

ConclusionLessons Learned• Many specs contain cloning

• Negative impact on reading and inspection effort

• Indication for corresponding redundancy in source code

• Cloning not necessary – many specs contain none

• Tailoring required but feasible: effort small w.r.t. inspection overhead

Future Work• How can cloning be avoided or removed?

• What are the causes for cloning? Different than for code clones?

• Further studies on consequences for implementation

Outlook• Other artefacts - test cases• Effects and costs of cloning• Functionally similar code detector

We need to detect and remove clones reliably and automatically.

Pictures Used in this Slide Deck

„Mercurial Logo“ by Mackall (http://www.selenic.com/hg-‐logo/)

http://www.selenic.com/hg-logo/

empirical results on cloning and clone detection

Technology

clone clone coverage

clone measures

inconsistent clone groups

frequent clone

impact of clones

code clones

cloning clone detection

types of clones type