Empirical Results on Cloning and Clone Detection

Cloning Clone Detection www.uni-stuttgart.de Empirical Results Stefan Wagner @prof_wagnerst Graduiertenkolleg
PUMA, TU München 27. November 2015 on and

You can copy, share and change, film and photograph, blog,
live-blog and tweet this presentation given that you attribute it to its author and respect the rights and licences of its parts. basiert auf Vorlagen von @SMEasterbrook und @ethanwhite

Technische Universität München

Class A Class B

Often 20%–30% redundancy

We need to detect clones reliably and automatically.

Types of Clones Type 1 an exact copy without modifications
(except for whitespace and comments) Type 2 a syntactically identical copy; only variable, type, or function identifiers have been changed Type 3 a copy with further modifications; statements have been changed, added, or removed

Clone detection: Processing steps Storage load tokenise & normalise ﬁnd
duplicates extract clones visualise

• Number of clone groups/clone instances • Size of largest
clone/cardinality of most frequent clone • Cloned Statements  Number of statements in the system being part of at least one clone • Clone Coverage – #Cloned Statements / #Statements – Probability of a randomly chosen statement to be part of a clone Measures for cloning

Compare View (~20 LOC) Seesoft View (~400 LOC) Tree Maps
(>1.000.000 LOC) Trends over Time Visualisation of clone detection results

Technische Universität München 1 Code Clones

Inconsistencies Can you spot the difference?

How problematic are these inconsistencies (and clones)? Indicating harmfulness [Lague97]:
inconsistent evolution of clones in industrial telecom. SW. [Monden02]: higher revision number for files with clones in legacy SW. [Kim05]: substantial amount of coupled changes to code clones. [Li06], [SuChiu07] and [Aversano07], [Bakota07]: discovery of bugs through search for inconsistent clones or clone evolution analysis. Doubting harmfulness [Krinke07]: inconsistent clones hardly ever become consistent later. [Geiger06]: Failure to statistically verify impact of clones on change couplings [Lozano08]: Failure to statistically verify impact of clones on changeability. [Göde11]: Most changes intentionally inconsistent [Rahman12]: no statistically significant impacts on faults

Our First Study at ICSE 2009 • Manual inspection of
inconsistent clones by system developers No indirect measures of consequences of cloning • Both industrial and open source software analysed • Quantitative data Deissenboeck, Juergens, Hummel, Wagner, ICSE, 2009

Research Questions RQ1: Are clones changed inconsistently? |IC| / |C|
RQ2: Are inconsistent clones created unintentionally? |UIC| / |IC| RQ3: Can inconsistent clones be indicators for faults in real systems? |F| / |IC|, |F| / |UIC| Clone Groups C (exact and incons.) Inconsistent clone groups IC Unintentionally incons. Clone Groups UIC Faulty clone Groups F

Study Design Tool detected clone group candidates CC Clone group
candidate detection • Novel algorithm • Tailored to target program False positive removal • Manual inspection of all inconsistent and ¼ exact CCs • Performed by researchers Assessment of inconsistencies • All inconsistent clone groups inspected • Performed by developers Clone groups C (exact and incons.) Inconsistent clone groups IC Unintentionally inconsistent clone groups UIC Faulty clone groups F → CC → C, IC → UIC, F

Study Objects International reinsurance company, 37.000 employees Munich-based life-insurance company,
400 employees Sysiphus: Open source collaboration environment for distributed SW development. Developed at TUM. 281 8 Java TUM Sysiphus 197 17 Cobol LV 1871 D 495 2 C# Munich Re C 454 4 C# Munich Re B 317 6 C# Munich Re A Size (kLoC) Age (years) Language Organization System

Results Project A B C D Sys. Sum Clone groups
|C| 286 160 326 352 303 1427 Inconsistent CGs |IC| 159 89 179 151 146 724 Unint. Incos. |UIC| 51 29 66 15 42 203 Faulty CGs |F| 19 18 42 5 23 107

Our Second Study • Investigating evolution of type-3 clones •
Relationship with documented faults from issue tracker • Industrial systems Under Review for SANER 2016

Research Questions RQ1: Do software systems contain type-3 clones? |CT3|
/ |C| RQ2: Do type-3 clones contain documented faults? |CT3 F | / |CT3| RQ3: Are developers aware of type-3 clones? |IMS | / |IM |, interviews with key developers Clone Groups C (exact and incons.) Inconsistent clone groups CT3 Faulty clone Groups CT3 F

Data Collection and Analysis rt HTML Dash- board v1 v2
v3 Extract Analyse Query for relationships and evolution Extract

Study Objects System Size (KLOC) Age (Years) Developers A 253
4 10 B 332 5 5 C 454 4 10 Java Automotive domain

Quantitative Results System A B C Overall Share of type-3
clones 0.56 0.23 0.79 0.52 Share of faulty clone type-3 classes 0.33 0.05 0.03 0.17 Share of simultaneously modiﬁed type-3 clones 0.58 0.89 0.92 0.85

Qualitative Results System A B C General clone awareness x
No general clone awareness x x No speciﬁc clone awareness x x No clone check while bug ﬁxing x x Clone warning while developing x Common code ownership x Discussion about co-changes x

Conclusions • About half of all clone classes are type-3
clones. • Rate of faulty type-3 clones is about 17 %. • There is a difference in awareness of clones and inconsistencies. • This awareness seems to impact how many faults are related to type-3 clones. • Further studies should take this into account. • Making developers aware of clones seems still to be worthwhile.

Technische Universität München 2 Requirements Clones

"Redundancy [in requirements speciﬁcations] causes good engineers to suffer and
the resulting systems will probably suffer, too." –Matthias Weber, Joachim Weisbrod

Modiﬁability generally requires a requirements speciﬁcation to […] not be
redundant. –IEEE 830-1998

Terms Requirements specification “specification for a particular software product, program,
or set of programs that performs certain functions in a specific environment.” [IEEE 830-1998] Clone • Duplicated specification text of at least 20 words • Small differences (e.g., declination) are tolerated • Must refer to specified system • False positives: e.g., page footers with copyright information

Research questions 1.How much cloning do real-world requirements specifications contain?
2.What kind of information is cloned in requirements specifications? 3.What consequences does cloning in requirements specifications have? 4.Can cloning in requirements specifications be detected accurately using existing clone detectors?

Study design Random assignment of speciﬁcations Detection tool execution Inspection
of detected clones Adding of ﬁlters False positives? Categorisation of clones Independent re-categorisation Analysis of corresp. source code Data analysis & interpretation Yes No

Regular expressions Removal of clones Improvement in precision Categorisation of
the types of false positives Adding of ﬁlters

• Qualitative analysis: content analysis • Sample is categorised •
Mix of theory-based and Grounded Theory • 4+8 categories • Documentation of additional information (mostly inconsistencies between clones) Categorisation of clones

2 raters Sample: 5 speciﬁcations Sample: 5 clone groups Analysis
of inter rater agreement Independent re-categorisation

Study objects 28 speciﬁcations 11 organisations 8,667 pages over 1.2
Mio. words English & German Domains: automotive avionics ﬁnance telecommunication transport

“The contracts with the clients describe the conditions regarding obligatory
liabilities that the clients have agreed on with X. The liabilities are calculated from the exposures from Y and the contract conditions from X. The liability- relevant parts of the contracts thus need to be managed in system Z.” “The contracts with the clients describe the conditions regarding obligatory liabilities that the clients have agreed on with X. The liabilities are calculated from the exposures from Y and the contract conditions from X. The liability- relevant parts of the contracts thus need to be managed in system Z.” “The contracts with the clients describe the conditions regarding obligatory liabilities that the clients have agreed on with X. The liabilities are calculated from the exposures from Y and the contract conditions from X. The liability- relevant parts of the contracts thus need to be managed in system Z.” Typical Clones • Entire use cases copied • Similar combinations of pre and post conditions copied • Descriptions of terms or roles copied Example* 42 instances (61 words, 13 instances with > 100 words) *Translated from German “The contracts with the clients describe the conditions regarding obligatory liabilities that the clients have agreed on with X. The liabilities are calculated from the exposures from Y and the contract conditions from X. The liability- relevant parts of the contracts thus need to be managed in system Z.” “The contracts with the clients describe the conditions regarding obligatory liabilities that the clients have agreed on with X. The liabilities are calculated from the exposures from Y and the contract conditions from X. The liability- relevant parts of the contracts thus need to be managed in system Z.” …

1.How much cloning do real-world requirements speciﬁcations contain? H F
A G Y Z L C K U X AB V B D N AC I P W O S M J E R Q T 0 0 0,7 0,9 1 1,2 1,6 1,9 2 5,8 5,5 5,4 8,2 8,1 8,9 11,2 12,1 12,4 15,5 18,1 18,5 20,5 19,6 21,9 22,1 35 51,1 71,6 Clone coverage in percentage Mean 13.6%

2.What kind of information is cloned? Use case step Reference
UI Domain knowledge Interface description Precondition Side condition Conﬁguration Feature Techn. domain knowlege Postcondition Rationale 1 3 3 5 6 7 10 13 14 15 15 24 Percentage of clones, more than one category possible

3.What consequences does cloning have? AB H L A Y
B V N U F AC D C Z G X K W M S I P O E R J Q T 0 0 0 0,1 0,3 0,3 0,3 0,3 0,4 0,5 0,6 1,2 2,1 2,8 2,9 3,2 4,1 4,2 4,8 7 8,2 10,3 11,1 12,7 17 17,5 18,5 36,7 Additional effort in hours per inspector Mean 6

Modification • Multiple inconsistent specification clones identified • Differences suspected
to be unintentional ⇒Indication that inconsistent updates happen in practice Implementation Traced specification clone groups to implementation. 3 cases: • Shared abstraction • Cloned code • Independent reimplementation of similar functionality ⇒Indication that spec. cloning causes redundancy in implementation

4.Can cloning be detected accurately using existing clone detectors? E
F G J N S W Z Y X I V B L AC P C M R AB A O D H K U 85 96 97 99 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 85 96 97 99 2 27 30 40 44 45 48 48 52 58 59 71 96 97 100 100 100 100 100 100 100 100 Before tailoring After tailoring Precision in percentage

Threats to validity Internal • Pairs of researchers to reduce
errors during manual steps • Reading speeds for cloned vs non-cloned text? Assumed similar. Further research required • Recall unclear. But: does not affect study results External • Substantial differences between requirements speciﬁcations (format, organisation, language, …) But: large amount of study objects from different companies, domains

Conclusion Lessons Learned • Many specs contain cloning • Negative
impact on reading and inspection effort • Indication for corresponding redundancy in source code • Cloning not necessary – many specs contain none • Tailoring required but feasible: effort small w.r.t. inspection overhead Future Work • How can cloning be avoided or removed? • What are the causes for cloning? Different than for code clones? • Further studies on consequences for implementation

Technische Universität München 3 Functional Similarity

Functional Similarities Not necessarily syntactically similar Type-4 clone: functionally similar
code fragment regarding I/O behaviour

General Idea • Execute candidate code fragments on random input
• Compare output Type-4 Clones Source Code Libraries Detection Pipeline Deissenboeck, Heinemann, Hummel, Wagner, CSMR 2012

Study Objects System SLOC Commons Lang 17,504 Freemind 51,762 Jabref
74,586 Jetty 29,8 JHotDraw 78,902 Info1 submissions 8 – 55

Discussion • Low results: no type-4 clones or ﬂaws in
detection? • Main limitation: random testing approach  no input or generated input does achieve sufﬁcient code coverage • Notion of I/O similarity may not be suitable  e.g different data types or signatures • Further research required to quantify these problems

So how are functionally similar clones (FSC) different? • RQ
1: What share of independently developed similar programs are type-1–3 clones? • RQ 2: What are the differences between FSC that go beyond type-1–3 clones? • RQ 3: What share of FSC can be detected by a type-4 clone detector? • RQ 4: What should a benchmark contain that represents the differences between FSC? Wagner et. al, PeerJ Preprints, https://dx.doi.org/10.7287/peerj.preprints.1516v1

Data Collection and Analysis rt HTML Dash- board Sol. 1
Sol. 2 Sol. 3 Extract Analyse Share of syntactic similarity Manual qualitative analysis CCCD Deckard Characteristics of differences Benchmark

Java C ConQAT (partial) Deckard (partial) ConQAT (full) Deckard (full)
ConQAT (partial) Deckard (partial) ConQAT (full) Deckard (full) 0,01 0 1,73 0 1,44 0,87 11,48 11,53 How syntactically similar are FSC? in %

What are other differences in FSC? Algorithms Input/output Libraries Object-oriented
design Data structures Degree of difference low medium high

What can CCCD detect? Full and partial clone recall me
sets for CCCD (in %) Mean SD Partial 16.03 0.07 Full 0.10 0.00 pret this result such that also contempo n tools have still problems detecting in %

A Benchmark to represent these differences changing structure nor algorithm,
the code clon realistic than fully artiﬁcial copies where one ﬁed as part of a study. Language Java C Category Degree of Diff. Clone Kind Data OO-Design ... ... low medium high partial full ... ... ... Solution (fle) left right ... Benchmark re 4: Structure of the benchmark set (over Available at: https://github.com/SE-Stuttgart/clone-study

Conclusions Lessons Learned • Independently developed FSCs have very little
syntactic similarity. • Type-1–3 detectors will not reliability detect them. • Newer approaches, such as CCCD, improve that but not by much. Future Work • Future proposal for type-4 detectors can use categorisation and benchmarks as „todo“ list. • Probably a combination of static and dynamic analyses needed

Outlook • Other artefacts - test cases • Effects and
costs of cloning • Functionally similar code detector

We need to detect clones reliably and automatically.

Pictures Used in this Slide Deck „Mercurial Logo“ by Mackall
(http://www.selenic.com/hg-logo/)

Empirical Results on Cloning and Clone Detection

Empirical Results on Cloning and Clone Detection

More Decks by Stefan Wagner

Other Decks in Science

Featured

Transcript