Refactoring Mining - The key to unlock software evolution

Slide 1

Slide 1 text

Refactoring Mining The key to unlock software evolution Nikolaos Tsantalis Concordia University

Slide 2

Slide 2 text

Thank you IWoR

Slide 3

Slide 3 text

Who am I RefactoringMiner

Slide 4

Slide 4 text

Refactoring Miner

Slide 5

Slide 5 text

Alexander Chatzigeorgiou Eleni Stroulia Zhenchang Xing Fabio Rocha

Slide 6

Slide 6 text

CASCON'13 version 0.0

Slide 7

Slide 7 text

• Ideas inspired from UMLDiff • Structure of method bodies is ignored • Method body includes only method calls and field accesses • Most refactorings detected based on signature matching • Only precision is provided • Evaluation included only 3 systems

Slide 8

Slide 8 text

Refactoring motivations

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

Why did I stop working on this research? Ref-Finder (ICSM 2010): "The precision and recall on open source projects were 0.74 and 0.96 respectively." "Since these programs did not document refactorings, we created a set of correct refactorings by running REF-FINDER with a similarity threshold (σ=0.65) and manually verified them. We then measured a recall by comparing this set with the results found using a higher threshold (σ=0.85)"

Slide 11

Slide 11 text

Lesson # 1 Don't be intim ida te d by s u p e r re sults Always que stion the results

Slide 12

Slide 12 text

Until one day... Danilo Silva Marco Tulio Valente

Slide 13

Slide 13 text

Where was the code?

Slide 14

Slide 14 text

Lesson #2 Always make your code & data available in a repository You never know what it can enable in the future

Slide 15

Slide 15 text

Open Science initiatives M. T. Baldassarre, N. Ernst, B. Hermann, T. Menzies, R. Yedida Mandatory Data Availability field

Slide 16

Slide 16 text

Let's do an empirical study about refactoring We must study Why We Refactor

Slide 17

Slide 17 text

ICSE'16 version 0.1

Slide 18

Slide 18 text

• Danilo developed the API of RefactoringMiner • Tooling for checking out and parsing Git commits • Infrastructure for monitoring GitHub projects • Automatic generation of emails to contact developers • A web app for thematic analysis

Slide 19

Slide 19 text

Firehouse interview • Monitored 124 GitHub projects between June 8th and August 7th, 2015 • Sent 465 emails and received 195 responses (42%) • +27 commits with a description explaining the reasons • Compiled a catalogue of 44 distinct motivations for 12 well-known refactoring types

Slide 20

Slide 20 text

Motivation Catalogue Artifact https://github.com/aserg- ufmg/why-we-refactor

Slide 21

Slide 21 text

ICSE'16 rejection Reviewer #1: "A major threat to the research is not discussed or considered, that RefFinder has poor recall (0.24 [31]). The authors did a good job of combating the low-precision by manually inspecting results, the low recall is not discussed or dealt with."

Slide 22

Slide 22 text

Lesson #3 Even ICSE reviewers make mistakes Don't get disappointed. Improve, advance, re-submit

Slide 23

Slide 23 text

FSE'16 re-submission • Renamed the tool from RefDetector to RefactoringMiner • Addressed most of reviewers' comments

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

RefDiff MSR'17 RefactoringMiner ...

Slide 26

Slide 26 text

The path of Virtue or Vice

Slide 27

Slide 27 text

Rebirth ICSE’18 version 1.0

Slide 28

Slide 28 text

Limitations of previous approaches 1. Dependence on similarity thresholds • thresholds need calibration for projects with different characteristics 2. Dependence on built versions • only 38% of the change history can be successfully compiled [Tufano et al., 2017] 3. Unreliable oracles for evaluating precision/recall • Incomplete (refactorings found in release notes or commit messages) • Biased (applying a single tool with two different similarity thresholds) • Artificial (seeded refactorings)

Slide 29

Slide 29 text

Thresholds are banned

Slide 30

Slide 30 text

private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } After Before

Slide 31

Slide 31 text

createAddresses(int count) { List

addresses = new ArrayList

(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } After Before