Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Empirical Results on Cloning and Clone Detection

Stefan Wagner
January 27, 2016

Empirical Results on Cloning and Clone Detection

This talk covers to aspects we recently did empirical studies on: (1) the relationship of type-3 clones and faults and (2) functionally similar clones (type-4).

Stefan Wagner

January 27, 2016
Tweet

More Decks by Stefan Wagner

Other Decks in Science

Transcript

  1. You can copy, share and change, film and photograph, blog,

    live-blog and tweet this presentation given that you attribute it to its author and respect the rights and licences of its parts. basiert auf Vorlagen von @SMEasterbrook und @ethanwhite
  2. Types of Clones Type 1 an exact copy without modifications

    (except for whitespace and comments) Type 2 a syntactically identical copy; only variable, type, or function identifiers have been changed Type 3 a copy with further modifications; statements have been changed, added, or removed
  3. • Number of clone groups/clone instances • Size of largest

    clone/cardinality of most frequent clone • Cloned Statements
 Number of statements in the system being part of at least one clone • Clone Coverage – #Cloned Statements / #Statements – Probability of a randomly chosen statement to be part of a clone Measures for cloning
  4. Compare View (~20 LOC) Seesoft View (~400 LOC) Tree Maps

    (>1.000.000 LOC) Trends over Time Visualisation of  clone detection results
  5. Cloning results 3 Study objects Clone coverage: 13.7 – 25.5%

    blow-up: 110 – 123% 1 Study object Clone coverage: 68 – 79.4 % blow-up: 239 – 336% 1 Study object Clone coverage: 36.7 – 45.4% blow-up:
 137 – 150%
  6. Perceived Usefulness Questions Clone Detection Experience in ASA techniques never

    Relevance of study results high Priority in future QA plans very high Gleirscher, Irlbeck, Wagner, Software Quality Journal, 2013
  7. How problematic are these inconsistencies (and clones)? Indicating harmfulness [Lague97]:

    inconsistent evolution of clones in industrial telecom. SW. [Monden02]: higher revision number for files with clones in legacy SW. [Kim05]: substantial amount of coupled changes to code clones. [Li06], [SuChiu07] and [Aversano07], [Bakota07]: discovery of bugs through search for inconsistent clones or clone evolution analysis. Doubting harmfulness [Krinke07]: inconsistent clones hardly ever become consistent later. [Geiger06]: Failure to statistically verify impact of clones on change couplings [Lozano08]: Failure to statistically verify impact of clones on changeability. [Göde11]: Most changes intentionally inconsistent [Rahman12]: no statistically significant impacts on faults
  8. Our First Study at ICSE 2009 • Manual inspection of

    inconsistent clones by system developers No indirect measures of consequences of cloning • Both industrial and open source software analysed • Quantitative data Deissenboeck, Juergens, Hummel, Wagner, ICSE, 2009
  9. Research Questions RQ1: Are clones changed inconsistently? |IC| / |C|

    RQ2: Are inconsistent clones created unintentionally? |UIC| / |IC| RQ3: Can inconsistent clones be indicators for faults in real systems? |F| / |IC|, |F| / |UIC| Clone Groups C (exact and incons.) Inconsistent clone groups IC Unintentionally incons. Clone Groups UIC Faulty clone Groups F
  10. Study Design Tool detected clone group candidates CC Clone group

    candidate detection • Novel algorithm • Tailored to target program False positive removal • Manual inspection of all inconsistent and ¼ exact CCs • Performed by researchers Assessment of inconsistencies • All inconsistent clone groups inspected • Performed by developers Clone groups C (exact and incons.) Inconsistent clone groups IC Unintentionally inconsistent clone groups UIC Faulty clone groups F → CC → C, IC → UIC, F
  11. Study Objects International reinsurance company, 37.000 employees Munich-based life-insurance company,

    400 employees Sysiphus: Open source collaboration environment for distributed SW development. Developed at TUM. 281 8 Java TUM Sysiphus 197 17 Cobol LV 1871 D 495 2 C# Munich Re C 454 4 C# Munich Re B 317 6 C# Munich Re A Size (kLoC) Age (years) Language Organization System
  12. Results Project A B C D Sys. Sum Clone groups

    |C| 286 160 326 352 303 1427 Inconsistent CGs |IC| 159 89 179 151 146 724 Unint. Incos. |UIC| 51 29 66 15 42 203 Faulty CGs |F| 19 18 42 5 23 107
  13. Our Second Study • Investigating evolution of type-3 clones •

    Relationship with documented faults from issue tracker • Industrial systems Accepted at SANER 2016
  14. Research Questions RQ1: Do software systems contain type-3 clones? |CT3|

    / |C| RQ2: Do type-3 clones contain documented faults? |CT3 F | / |CT3| RQ3: Are developers aware of type-3 clones? |IMS | / |IM |, interviews with key developers Clone Groups C (exact and incons.) Inconsistent clone groups CT3 Faulty clone Groups CT3 F
  15. Data Collection and Analysis rt HTML Dash- board v1 v2

    v3 Extract Analyse Query for relationships and evolution Extract
  16. Study Objects System Size (KLOC) Age (Years) Developers A 253

    4 10 B 332 5 5 C 454 4 10 Java Automotive domain
  17. Quantitative Results System A B C Overall Share of type-3

    clones 0.56 0.23 0.79 0.52 Share of faulty clone type-3 classes 0.33 0.05 0.03 0.17 Share of simultaneously modified type-3 clones 0.58 0.89 0.92 0.85
  18. Qualitative Results System A B C General clone awareness x

    No general clone awareness x x No specific clone awareness x x No clone check while bug fixing x x Clone warning while developing x Common code ownership x Discussion about co-changes x
  19. Conclusions • About half of all clone classes are type-3

    clones. • Rate of faulty type-3 clones is about 17 %. • There is a difference in awareness of clones and inconsistencies. • This awareness seems to impact how many faults are related to type-3 clones. • Further studies should take this into account. • Making developers aware of clones seems still to be worthwhile.
  20. First Idea • Execute candidate code fragments on random input

    • Compare output Type-4 Clones Source Code Libraries Detection Pipeline Deissenboeck, Heinemann, Hummel, Wagner, CSMR 2012
  21. Study Objects System SLOC Commons Lang 17,504 Freemind 51,762 Jabref

    74,586 Jetty 29,8 JHotDraw 78,902 Info1 submissions 8 – 55
  22. Percentage of fragments that are type-4 clones Freemind JHotDraw Jetty

    JabRef Comm. Lang Info1 43,75 3,51 2,64 1,03 0,64 0,55
  23. Discussion • Low results: no type-4 clones or flaws in

    detection? • Main limitation: random testing approach
 no input or generated input does achieve sufficient code coverage • Notion of I/O similarity may not be suitable
 e.g different data types or signatures • Further research required to quantify these problems
  24. So how are functionally similar clones (FSC) different? • RQ

    1: What share of independently developed similar programs are type-1–3 clones? • RQ 2: What are the differences between FSC that go beyond type-1–3 clones? • RQ 3: What share of FSC can be detected by a type-4 clone detector? • RQ 4: What should a benchmark contain that represents the differences between FSC? Wagner et. al, PeerJ Preprints, https://dx.doi.org/10.7287/peerj.preprints.1516v1
  25. Data Collection and Analysis rt HTML Dash- board Sol. 1

    Sol. 2 Sol. 3 Extract Analyse Share of syntactic similarity Manual qualitative analysis CCCD Deckard Characteristics of differences Benchmark
  26. Java C ConQAT (partial) Deckard (partial) ConQAT (full) Deckard (full)

    ConQAT (partial) Deckard (partial) ConQAT (full) Deckard (full) 0,01 0 1,73 0 1,44 0,87 11,48 11,53 How syntactically similar are FSC? in %
  27. What are other differences in FSC? Algorithms Input/output Libraries Object-oriented

    design Data structures Degree of difference low medium high
  28. What can CCCD detect? Full and partial clone recall me

    sets for CCCD (in %) Mean SD Partial 16.03 0.07 Full 0.10 0.00 pret this result such that also contempo n tools have still problems detecting in %
  29. A Benchmark to represent these differences changing structure nor algorithm,

    the code clon realistic than fully artificial copies where one fied as part of a study. Language Java C Category Degree of Diff. Clone Kind Data OO-Design ... ... low medium high partial full ... ... ... Solution (fle) left right ... Benchmark re 4: Structure of the benchmark set (over Available at: https://github.com/SE-Stuttgart/clone-study
  30. Conclusions Lessons Learned • Independently developed FSCs have very little

    syntactic similarity. • Type-1–3 detectors will not reliability detect them. • Newer approaches, such as CCCD, improve that but not by much. Future Work • Future proposal for type-4 detectors can use categorisation and benchmarks as „todo“ list. • Probably a combination of static and dynamic analyses needed