This talk covers to aspects we recently did empirical studies on: (1) the relationship of type-3 clones and faults and (2) functionally similar clones (type-4).
live-blog and tweet this presentation given that you attribute it to its author and respect the rights and licences of its parts. basiert auf Vorlagen von @SMEasterbrook und @ethanwhite
(except for whitespace and comments) Type 2 a syntactically identical copy; only variable, type, or function identifiers have been changed Type 3 a copy with further modifications; statements have been changed, added, or removed
clone/cardinality of most frequent clone • Cloned Statements Number of statements in the system being part of at least one clone • Clone Coverage – #Cloned Statements / #Statements – Probability of a randomly chosen statement to be part of a clone Measures for cloning
inconsistent evolution of clones in industrial telecom. SW. [Monden02]: higher revision number for files with clones in legacy SW. [Kim05]: substantial amount of coupled changes to code clones. [Li06], [SuChiu07] and [Aversano07], [Bakota07]: discovery of bugs through search for inconsistent clones or clone evolution analysis. Doubting harmfulness [Krinke07]: inconsistent clones hardly ever become consistent later. [Geiger06]: Failure to statistically verify impact of clones on change couplings [Lozano08]: Failure to statistically verify impact of clones on changeability. [Göde11]: Most changes intentionally inconsistent [Rahman12]: no statistically significant impacts on faults
inconsistent clones by system developers No indirect measures of consequences of cloning • Both industrial and open source software analysed • Quantitative data Deissenboeck, Juergens, Hummel, Wagner, ICSE, 2009
RQ2: Are inconsistent clones created unintentionally? |UIC| / |IC| RQ3: Can inconsistent clones be indicators for faults in real systems? |F| / |IC|, |F| / |UIC| Clone Groups C (exact and incons.) Inconsistent clone groups IC Unintentionally incons. Clone Groups UIC Faulty clone Groups F
candidate detection • Novel algorithm • Tailored to target program False positive removal • Manual inspection of all inconsistent and ¼ exact CCs • Performed by researchers Assessment of inconsistencies • All inconsistent clone groups inspected • Performed by developers Clone groups C (exact and incons.) Inconsistent clone groups IC Unintentionally inconsistent clone groups UIC Faulty clone groups F → CC → C, IC → UIC, F
400 employees Sysiphus: Open source collaboration environment for distributed SW development. Developed at TUM. 281 8 Java TUM Sysiphus 197 17 Cobol LV 1871 D 495 2 C# Munich Re C 454 4 C# Munich Re B 317 6 C# Munich Re A Size (kLoC) Age (years) Language Organization System
/ |C| RQ2: Do type-3 clones contain documented faults? |CT3 F | / |CT3| RQ3: Are developers aware of type-3 clones? |IMS | / |IM |, interviews with key developers Clone Groups C (exact and incons.) Inconsistent clone groups CT3 Faulty clone Groups CT3 F
No general clone awareness x x No specific clone awareness x x No clone check while bug fixing x x Clone warning while developing x Common code ownership x Discussion about co-changes x
clones. • Rate of faulty type-3 clones is about 17 %. • There is a difference in awareness of clones and inconsistencies. • This awareness seems to impact how many faults are related to type-3 clones. • Further studies should take this into account. • Making developers aware of clones seems still to be worthwhile.
detection? • Main limitation: random testing approach no input or generated input does achieve sufficient code coverage • Notion of I/O similarity may not be suitable e.g different data types or signatures • Further research required to quantify these problems
1: What share of independently developed similar programs are type-1–3 clones? • RQ 2: What are the differences between FSC that go beyond type-1–3 clones? • RQ 3: What share of FSC can be detected by a type-4 clone detector? • RQ 4: What should a benchmark contain that represents the differences between FSC? Wagner et. al, PeerJ Preprints, https://dx.doi.org/10.7287/peerj.preprints.1516v1
the code clon realistic than fully artificial copies where one fied as part of a study. Language Java C Category Degree of Diff. Clone Kind Data OO-Design ... ... low medium high partial full ... ... ... Solution (fle) left right ... Benchmark re 4: Structure of the benchmark set (over Available at: https://github.com/SE-Stuttgart/clone-study
syntactic similarity. • Type-1–3 detectors will not reliability detect them. • Newer approaches, such as CCCD, improve that but not by much. Future Work • Future proposal for type-4 detectors can use categorisation and benchmarks as „todo“ list. • Probably a combination of static and dynamic analyses needed