Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Empirical Results on Cloning and Clone Detection

Stefan Wagner
November 27, 2015

Empirical Results on Cloning and Clone Detection

I talked at the PUMA research training group of LMU and TU Munich about software cloning. I reported on our empirical results on clone detection in software artefacts and describe our first activities towards detecting functionally similar clones.

Stefan Wagner

November 27, 2015
Tweet

More Decks by Stefan Wagner

Other Decks in Science

Transcript

  1. You can copy, share and change, film and photograph, blog,

    live-blog and tweet this presentation given that you attribute it to its author and respect the rights and licences of its parts. basiert auf Vorlagen von @SMEasterbrook und @ethanwhite
  2. Types of Clones Type 1 an exact copy without modifications

    (except for whitespace and comments) Type 2 a syntactically identical copy; only variable, type, or function identifiers have been changed Type 3 a copy with further modifications; statements have been changed, added, or removed
  3. • Number of clone groups/clone instances • Size of largest

    clone/cardinality of most frequent clone • Cloned Statements
 Number of statements in the system being part of at least one clone • Clone Coverage – #Cloned Statements / #Statements – Probability of a randomly chosen statement to be part of a clone Measures for cloning
  4. Compare View (~20 LOC) Seesoft View (~400 LOC) Tree Maps

    (>1.000.000 LOC) Trends over Time Visualisation of  clone detection results
  5. How problematic are these inconsistencies (and clones)? Indicating harmfulness [Lague97]:

    inconsistent evolution of clones in industrial telecom. SW. [Monden02]: higher revision number for files with clones in legacy SW. [Kim05]: substantial amount of coupled changes to code clones. [Li06], [SuChiu07] and [Aversano07], [Bakota07]: discovery of bugs through search for inconsistent clones or clone evolution analysis. Doubting harmfulness [Krinke07]: inconsistent clones hardly ever become consistent later. [Geiger06]: Failure to statistically verify impact of clones on change couplings [Lozano08]: Failure to statistically verify impact of clones on changeability. [Göde11]: Most changes intentionally inconsistent [Rahman12]: no statistically significant impacts on faults
  6. Our First Study at ICSE 2009 • Manual inspection of

    inconsistent clones by system developers No indirect measures of consequences of cloning • Both industrial and open source software analysed • Quantitative data Deissenboeck, Juergens, Hummel, Wagner, ICSE, 2009
  7. Research Questions RQ1: Are clones changed inconsistently? |IC| / |C|

    RQ2: Are inconsistent clones created unintentionally? |UIC| / |IC| RQ3: Can inconsistent clones be indicators for faults in real systems? |F| / |IC|, |F| / |UIC| Clone Groups C (exact and incons.) Inconsistent clone groups IC Unintentionally incons. Clone Groups UIC Faulty clone Groups F
  8. Study Design Tool detected clone group candidates CC Clone group

    candidate detection • Novel algorithm • Tailored to target program False positive removal • Manual inspection of all inconsistent and ¼ exact CCs • Performed by researchers Assessment of inconsistencies • All inconsistent clone groups inspected • Performed by developers Clone groups C (exact and incons.) Inconsistent clone groups IC Unintentionally inconsistent clone groups UIC Faulty clone groups F → CC → C, IC → UIC, F
  9. Study Objects International reinsurance company, 37.000 employees Munich-based life-insurance company,

    400 employees Sysiphus: Open source collaboration environment for distributed SW development. Developed at TUM. 281 8 Java TUM Sysiphus 197 17 Cobol LV 1871 D 495 2 C# Munich Re C 454 4 C# Munich Re B 317 6 C# Munich Re A Size (kLoC) Age (years) Language Organization System
  10. Results Project A B C D Sys. Sum Clone groups

    |C| 286 160 326 352 303 1427 Inconsistent CGs |IC| 159 89 179 151 146 724 Unint. Incos. |UIC| 51 29 66 15 42 203 Faulty CGs |F| 19 18 42 5 23 107
  11. Our Second Study • Investigating evolution of type-3 clones •

    Relationship with documented faults from issue tracker • Industrial systems Under Review for SANER 2016
  12. Research Questions RQ1: Do software systems contain type-3 clones? |CT3|

    / |C| RQ2: Do type-3 clones contain documented faults? |CT3 F | / |CT3| RQ3: Are developers aware of type-3 clones? |IMS | / |IM |, interviews with key developers Clone Groups C (exact and incons.) Inconsistent clone groups CT3 Faulty clone Groups CT3 F
  13. Data Collection and Analysis rt HTML Dash- board v1 v2

    v3 Extract Analyse Query for relationships and evolution Extract
  14. Study Objects System Size (KLOC) Age (Years) Developers A 253

    4 10 B 332 5 5 C 454 4 10 Java Automotive domain
  15. Quantitative Results System A B C Overall Share of type-3

    clones 0.56 0.23 0.79 0.52 Share of faulty clone type-3 classes 0.33 0.05 0.03 0.17 Share of simultaneously modified type-3 clones 0.58 0.89 0.92 0.85
  16. Qualitative Results System A B C General clone awareness x

    No general clone awareness x x No specific clone awareness x x No clone check while bug fixing x x Clone warning while developing x Common code ownership x Discussion about co-changes x
  17. Conclusions • About half of all clone classes are type-3

    clones. • Rate of faulty type-3 clones is about 17 %. • There is a difference in awareness of clones and inconsistencies. • This awareness seems to impact how many faults are related to type-3 clones. • Further studies should take this into account. • Making developers aware of clones seems still to be worthwhile.
  18. "Redundancy [in requirements specifications] causes good engineers to suffer and

    the resulting systems will probably suffer, too." –Matthias Weber, Joachim Weisbrod
  19. Terms Requirements specification “specification for a particular software product, program,

    or set of programs that performs certain functions in a specific environment.” [IEEE 830-1998] Clone • Duplicated specification text of at least 20 words • Small differences (e.g., declination) are tolerated • Must refer to specified system • False positives: e.g., page footers with copyright information
  20. Research questions 1.How much cloning do real-world requirements specifications contain?

    2.What kind of information is cloned in requirements specifications? 3.What consequences does cloning in requirements specifications have? 4.Can cloning in requirements specifications be detected accurately using existing clone detectors?
  21. Study design Random assignment of specifications Detection tool execution Inspection

    of detected clones Adding of filters False positives? Categorisation of clones Independent re-categorisation Analysis of corresp. source code Data analysis & interpretation Yes No
  22. Study design Random assignment of specifications Detection tool execution Inspection

    of detected clones Adding of filters False positives? Categorisation of clones Independent re-categorisation Analysis of corresp. source code Data analysis & interpretation Yes No
  23. • Qualitative analysis: content analysis • Sample is categorised •

    Mix of theory-based and Grounded Theory • 4+8 categories • Documentation of additional information (mostly inconsistencies between clones) Categorisation of clones
  24. Study design Random assignment of specifications Detection tool execution Inspection

    of detected clones Adding of filters False positives? Categorisation of clones Independent re-categorisation Analysis of corresp. source code Data analysis & interpretation Yes No
  25. 2 raters Sample: 5 specifications Sample: 5 clone groups Analysis

    of inter rater agreement Independent re-categorisation
  26. Study design Random assignment of specifications Detection tool execution Inspection

    of detected clones Adding of filters False positives? Categorisation of clones Independent re-categorisation Analysis of corresp. source code Data analysis & interpretation Yes No
  27. Study objects 28 specifications 11 organisations 8,667 pages over 1.2

    Mio. words English & German Domains: automotive avionics finance telecommunication transport
  28. “The contracts with the clients describe the conditions regarding obligatory

    liabilities that the clients have agreed on with X. The liabilities are calculated from the exposures from Y and the contract conditions from X. The liability- relevant parts of the contracts thus need to be managed in system Z.” “The contracts with the clients describe the conditions regarding obligatory liabilities that the clients have agreed on with X. The liabilities are calculated from the exposures from Y and the contract conditions from X. The liability- relevant parts of the contracts thus need to be managed in system Z.” “The contracts with the clients describe the conditions regarding obligatory liabilities that the clients have agreed on with X. The liabilities are calculated from the exposures from Y and the contract conditions from X. The liability- relevant parts of the contracts thus need to be managed in system Z.” Typical Clones • Entire use cases copied • Similar combinations of pre and post conditions copied • Descriptions of terms or roles copied Example* 42 instances (61 words, 13 instances with > 100 words) *Translated from German “The contracts with the clients describe the conditions regarding obligatory liabilities that the clients have agreed on with X. The liabilities are calculated from the exposures from Y and the contract conditions from X. The liability- relevant parts of the contracts thus need to be managed in system Z.” “The contracts with the clients describe the conditions regarding obligatory liabilities that the clients have agreed on with X. The liabilities are calculated from the exposures from Y and the contract conditions from X. The liability- relevant parts of the contracts thus need to be managed in system Z.” …
  29. 1.How much cloning do real-world requirements specifications contain? H F

    A G Y Z L C K U X AB V B D N AC I P W O S M J E R Q T 0 0 0,7 0,9 1 1,2 1,6 1,9 2 5,8 5,5 5,4 8,2 8,1 8,9 11,2 12,1 12,4 15,5 18,1 18,5 20,5 19,6 21,9 22,1 35 51,1 71,6 Clone coverage in percentage Mean 13.6%
  30. 2.What kind of information is cloned? Use case step Reference

    UI Domain knowledge Interface description Precondition Side condition Configuration Feature Techn. domain knowlege Postcondition Rationale 1 3 3 5 6 7 10 13 14 15 15 24 Percentage of clones, more than one category possible
  31. 3.What consequences does cloning have? AB H L A Y

    B V N U F AC D C Z G X K W M S I P O E R J Q T 0 0 0 0,1 0,3 0,3 0,3 0,3 0,4 0,5 0,6 1,2 2,1 2,8 2,9 3,2 4,1 4,2 4,8 7 8,2 10,3 11,1 12,7 17 17,5 18,5 36,7 Additional effort in hours per inspector Mean 6
  32. Modification • Multiple inconsistent specification clones identified • Differences suspected

    to be unintentional ⇒Indication that inconsistent updates happen in practice Implementation Traced specification clone groups to implementation. 3 cases: • Shared abstraction • Cloned code • Independent reimplementation of similar functionality ⇒Indication that spec. cloning causes redundancy in implementation
  33. 4.Can cloning be detected accurately using existing clone detectors? E

    F G J N S W Z Y X I V B L AC P C M R AB A O D H K U 85 96 97 99 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 85 96 97 99 2 27 30 40 44 45 48 48 52 58 59 71 96 97 100 100 100 100 100 100 100 100 Before tailoring After tailoring Precision in percentage
  34. Threats to validity Internal • Pairs of researchers to reduce

    errors during manual steps • Reading speeds for cloned vs non-cloned text? Assumed similar. Further research required • Recall unclear. But: does not affect study results External • Substantial differences between requirements specifications (format, organisation, language, …) But: large amount of study objects from different companies, domains
  35. Conclusion Lessons Learned • Many specs contain cloning • Negative

    impact on reading and inspection effort • Indication for corresponding redundancy in source code • Cloning not necessary – many specs contain none • Tailoring required but feasible: effort small w.r.t. inspection overhead Future Work • How can cloning be avoided or removed? • What are the causes for cloning? Different than for code clones? • Further studies on consequences for implementation
  36. General Idea • Execute candidate code fragments on random input

    • Compare output Type-4 Clones Source Code Libraries Detection Pipeline Deissenboeck, Heinemann, Hummel, Wagner, CSMR 2012
  37. Study Objects System SLOC Commons Lang 17,504 Freemind 51,762 Jabref

    74,586 Jetty 29,8 JHotDraw 78,902 Info1 submissions 8 – 55
  38. Discussion • Low results: no type-4 clones or flaws in

    detection? • Main limitation: random testing approach
 no input or generated input does achieve sufficient code coverage • Notion of I/O similarity may not be suitable
 e.g different data types or signatures • Further research required to quantify these problems
  39. So how are functionally similar clones (FSC) different? • RQ

    1: What share of independently developed similar programs are type-1–3 clones? • RQ 2: What are the differences between FSC that go beyond type-1–3 clones? • RQ 3: What share of FSC can be detected by a type-4 clone detector? • RQ 4: What should a benchmark contain that represents the differences between FSC? Wagner et. al, PeerJ Preprints, https://dx.doi.org/10.7287/peerj.preprints.1516v1
  40. Data Collection and Analysis rt HTML Dash- board Sol. 1

    Sol. 2 Sol. 3 Extract Analyse Share of syntactic similarity Manual qualitative analysis CCCD Deckard Characteristics of differences Benchmark
  41. Java C ConQAT (partial) Deckard (partial) ConQAT (full) Deckard (full)

    ConQAT (partial) Deckard (partial) ConQAT (full) Deckard (full) 0,01 0 1,73 0 1,44 0,87 11,48 11,53 How syntactically similar are FSC? in %
  42. What are other differences in FSC? Algorithms Input/output Libraries Object-oriented

    design Data structures Degree of difference low medium high
  43. What can CCCD detect? Full and partial clone recall me

    sets for CCCD (in %) Mean SD Partial 16.03 0.07 Full 0.10 0.00 pret this result such that also contempo n tools have still problems detecting in %
  44. A Benchmark to represent these differences changing structure nor algorithm,

    the code clon realistic than fully artificial copies where one fied as part of a study. Language Java C Category Degree of Diff. Clone Kind Data OO-Design ... ... low medium high partial full ... ... ... Solution (fle) left right ... Benchmark re 4: Structure of the benchmark set (over Available at: https://github.com/SE-Stuttgart/clone-study
  45. Conclusions Lessons Learned • Independently developed FSCs have very little

    syntactic similarity. • Type-1–3 detectors will not reliability detect them. • Newer approaches, such as CCCD, improve that but not by much. Future Work • Future proposal for type-4 detectors can use categorisation and benchmarks as „todo“ list. • Probably a combination of static and dynamic analyses needed
  46. Outlook • Other artefacts - test cases • Effects and

    costs of cloning • Functionally similar code detector