Empirical Results on Cloning and Clone Detection

Cloning Clone Detection www.uni-stuttgart.de Empirical Results Stefan Wagner @prof_wagnerst Universität
Bremen 27. Januar 2016 on and

You can copy, share and change, film and photograph, blog,
live-blog and tweet this presentation given that you attribute it to its author and respect the rights and licences of its parts. basiert auf Vorlagen von @SMEasterbrook und @ethanwhite

Technische Universität München

Class A Class B

Often 20%–30% redundancy

We need to detect clones reliably and automatically.

Types of Clones Type 1 an exact copy without modifications
(except for whitespace and comments) Type 2 a syntactically identical copy; only variable, type, or function identifiers have been changed Type 3 a copy with further modifications; statements have been changed, added, or removed

Clone detection: Processing steps Storage load tokenise & normalise ﬁnd
duplicates extract clones visualise

• Number of clone groups/clone instances • Size of largest
clone/cardinality of most frequent clone • Cloned Statements  Number of statements in the system being part of at least one clone • Clone Coverage – #Cloned Statements / #Statements – Probability of a randomly chosen statement to be part of a clone Measures for cloning

Compare View (~20 LOC) Seesoft View (~400 LOC) Tree Maps
(>1.000.000 LOC) Trends over Time Visualisation of clone detection results

SME Study Technology Transfer of Quality Assurance Techniques

Cloning results 3 Study objects Clone coverage: 13.7 – 25.5%
blow-up: 110 – 123% 1 Study object Clone coverage: 68 – 79.4 % blow-up: 239 – 336% 1 Study object Clone coverage: 36.7 – 45.4% blow-up:  137 – 150%

Perceived Usefulness Questions Clone Detection Experience in ASA techniques never
Relevance of study results high Priority in future QA plans very high Gleirscher, Irlbeck, Wagner, Software Quality Journal, 2013

Technische Universität München 1 Effects of Code Clones

Inconsistencies Can you spot the difference?

How problematic are these inconsistencies (and clones)? Indicating harmfulness [Lague97]:
inconsistent evolution of clones in industrial telecom. SW. [Monden02]: higher revision number for files with clones in legacy SW. [Kim05]: substantial amount of coupled changes to code clones. [Li06], [SuChiu07] and [Aversano07], [Bakota07]: discovery of bugs through search for inconsistent clones or clone evolution analysis. Doubting harmfulness [Krinke07]: inconsistent clones hardly ever become consistent later. [Geiger06]: Failure to statistically verify impact of clones on change couplings [Lozano08]: Failure to statistically verify impact of clones on changeability. [Göde11]: Most changes intentionally inconsistent [Rahman12]: no statistically significant impacts on faults

Our First Study at ICSE 2009 • Manual inspection of
inconsistent clones by system developers No indirect measures of consequences of cloning • Both industrial and open source software analysed • Quantitative data Deissenboeck, Juergens, Hummel, Wagner, ICSE, 2009

Research Questions RQ1: Are clones changed inconsistently? |IC| / |C|
RQ2: Are inconsistent clones created unintentionally? |UIC| / |IC| RQ3: Can inconsistent clones be indicators for faults in real systems? |F| / |IC|, |F| / |UIC| Clone Groups C (exact and incons.) Inconsistent clone groups IC Unintentionally incons. Clone Groups UIC Faulty clone Groups F

Study Design Tool detected clone group candidates CC Clone group
candidate detection • Novel algorithm • Tailored to target program False positive removal • Manual inspection of all inconsistent and ¼ exact CCs • Performed by researchers Assessment of inconsistencies • All inconsistent clone groups inspected • Performed by developers Clone groups C (exact and incons.) Inconsistent clone groups IC Unintentionally inconsistent clone groups UIC Faulty clone groups F → CC → C, IC → UIC, F

Study Objects International reinsurance company, 37.000 employees Munich-based life-insurance company,
400 employees Sysiphus: Open source collaboration environment for distributed SW development. Developed at TUM. 281 8 Java TUM Sysiphus 197 17 Cobol LV 1871 D 495 2 C# Munich Re C 454 4 C# Munich Re B 317 6 C# Munich Re A Size (kLoC) Age (years) Language Organization System

Results Project A B C D Sys. Sum Clone groups
|C| 286 160 326 352 303 1427 Inconsistent CGs |IC| 159 89 179 151 146 724 Unint. Incos. |UIC| 51 29 66 15 42 203 Faulty CGs |F| 19 18 42 5 23 107

Our Second Study • Investigating evolution of type-3 clones •
Relationship with documented faults from issue tracker • Industrial systems Accepted at SANER 2016

Research Questions RQ1: Do software systems contain type-3 clones? |CT3|
/ |C| RQ2: Do type-3 clones contain documented faults? |CT3 F | / |CT3| RQ3: Are developers aware of type-3 clones? |IMS | / |IM |, interviews with key developers Clone Groups C (exact and incons.) Inconsistent clone groups CT3 Faulty clone Groups CT3 F

Data Collection and Analysis rt HTML Dash- board v1 v2
v3 Extract Analyse Query for relationships and evolution Extract

Study Objects System Size (KLOC) Age (Years) Developers A 253
4 10 B 332 5 5 C 454 4 10 Java Automotive domain

Quantitative Results System A B C Overall Share of type-3
clones 0.56 0.23 0.79 0.52 Share of faulty clone type-3 classes 0.33 0.05 0.03 0.17 Share of simultaneously modiﬁed type-3 clones 0.58 0.89 0.92 0.85

Qualitative Results System A B C General clone awareness x
No general clone awareness x x No speciﬁc clone awareness x x No clone check while bug ﬁxing x x Clone warning while developing x Common code ownership x Discussion about co-changes x

Conclusions • About half of all clone classes are type-3
clones. • Rate of faulty type-3 clones is about 17 %. • There is a difference in awareness of clones and inconsistencies. • This awareness seems to impact how many faults are related to type-3 clones. • Further studies should take this into account. • Making developers aware of clones seems still to be worthwhile.

Technische Universität München 2 Functional Similarity

Functional Similarities Not necessarily syntactically similar Type-4 clone: functionally similar
code fragment regarding I/O behaviour

First Idea • Execute candidate code fragments on random input
• Compare output Type-4 Clones Source Code Libraries Detection Pipeline Deissenboeck, Heinemann, Hummel, Wagner, CSMR 2012

Study Objects System SLOC Commons Lang 17,504 Freemind 51,762 Jabref
74,586 Jetty 29,8 JHotDraw 78,902 Info1 submissions 8 – 55

Percentage of fragments that are type-4 clones Freemind JHotDraw Jetty
JabRef Comm. Lang Info1 43,75 3,51 2,64 1,03 0,64 0,55

Discussion • Low results: no type-4 clones or ﬂaws in
detection? • Main limitation: random testing approach  no input or generated input does achieve sufﬁcient code coverage • Notion of I/O similarity may not be suitable  e.g different data types or signatures • Further research required to quantify these problems

So how are functionally similar clones (FSC) different? • RQ
1: What share of independently developed similar programs are type-1–3 clones? • RQ 2: What are the differences between FSC that go beyond type-1–3 clones? • RQ 3: What share of FSC can be detected by a type-4 clone detector? • RQ 4: What should a benchmark contain that represents the differences between FSC? Wagner et. al, PeerJ Preprints, https://dx.doi.org/10.7287/peerj.preprints.1516v1

Data Collection and Analysis rt HTML Dash- board Sol. 1
Sol. 2 Sol. 3 Extract Analyse Share of syntactic similarity Manual qualitative analysis CCCD Deckard Characteristics of differences Benchmark

Java C ConQAT (partial) Deckard (partial) ConQAT (full) Deckard (full)
ConQAT (partial) Deckard (partial) ConQAT (full) Deckard (full) 0,01 0 1,73 0 1,44 0,87 11,48 11,53 How syntactically similar are FSC? in %

What are other differences in FSC? Algorithms Input/output Libraries Object-oriented
design Data structures Degree of difference low medium high

What can CCCD detect? Full and partial clone recall me
sets for CCCD (in %) Mean SD Partial 16.03 0.07 Full 0.10 0.00 pret this result such that also contempo n tools have still problems detecting in %

A Benchmark to represent these differences changing structure nor algorithm,
the code clon realistic than fully artiﬁcial copies where one ﬁed as part of a study. Language Java C Category Degree of Diff. Clone Kind Data OO-Design ... ... low medium high partial full ... ... ... Solution (fle) left right ... Benchmark re 4: Structure of the benchmark set (over Available at: https://github.com/SE-Stuttgart/clone-study

Conclusions Lessons Learned • Independently developed FSCs have very little
syntactic similarity. • Type-1–3 detectors will not reliability detect them. • Newer approaches, such as CCCD, improve that but not by much. Future Work • Future proposal for type-4 detectors can use categorisation and benchmarks as „todo“ list. • Probably a combination of static and dynamic analyses needed

Outlook • Factors inﬂuencing the effects of cloning • Detector
for functionally similar code

We need to detect clones reliably and automatically.

Pictures Used in this Slide Deck „Mercurial Logo“ by Mackall
(http://www.selenic.com/hg-logo/)

Empirical Results on Cloning and Clone Detection

Empirical Results on Cloning and Clone Detection

More Decks by Stefan Wagner

Other Decks in Science

Featured

Transcript