Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond Java: Expanding Refactoring Research to Multiple Programming Languages (IWoR 2019)

Beyond Java: Expanding Refactoring Research to Multiple Programming Languages (IWoR 2019)

Traditionally, academic research on refactoring has focused on Java programs. However, in recent years, other programming languages are gaining momentum and are being used to build highly-successful applications. In this talk, I will first motivate the need to expand research on refactoring to support other programming languages. Second, I will argue this support is not only an engineering and portability issue; by contrast, it demands major changes in the tools and algorithms previously developed by researchers in the area. Finally, I will present our current effort to evolve and redesign RefDiff—our refactoring detection tool—to work with multiple languages and programming paradigms.

ASERG, DCC, UFMG

May 28, 2019
Tweet

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Transcript

  1. 2

  2. Refactoring Papers 10 • Study by Marouane Kessentini, Danny Digg

    et al. • Java dominates • More papers using Java than all other languages combined • C++ is the 2nd language Danny Dig's keynote talk at WAPI 2018 https://w-api.github.io/resources/wapi18_dig_refactoring.pdf
  3. Fowler, 2nd ed. (2018) 12 The first edition of this

    book used Java … [In this 2nd edition] I chose JavaScript to illustrate these refactorings ...
  4. 16 Firehouse interviews: * 195 developers * 124 projects *

    436 refactoring instances * 12 refactoring operations * 44 reasons for refactoring FSE 16
  5. 17 Firehouse interviews: * 195 developers * 124 projects *

    436 refactoring instances * 12 refactoring operations * 44 reasons for refactoring FSE 16
  6. 26 Refactoring engines ≈ Java-based IDEs Do we have refactoring

    engines for other languages? If yes, can we reuse this technique? IEEE TSE 12
  7. 29 JMove ⇒ Java-based tool How to infer dependencies in

    JavaScript (e.g., using Facebook's Flow)? Can we reuse these ideas to build JSMove?
  8. What features do we have in all languages? 2. Containment

    Hierarchy: • a program is not a flat list of tokens • Examples: ◦ C: Tokens → Functions → Files ◦ Java: Tokens → Methods → Classes → Packages ◦ JavaScript: Tokens → Functions → Files • Tokens + Containment Hierarchy ⇒ Code Structure Tree (CST) 41
  9. CST vs AST • Difference is not C vs A,

    but in the "S": ◦ CST ⇒ Structure (or hierarchy), which we have in all languages ◦ AST ⇒ Syntax, which is language-specific • CST are "universal" ASTs 43
  10. CST vs AST • CSTs are also more simple structures

    than ASTs • Argument #1: ◦ JDT AST: 112 classes ◦ CST: 1 class • Argument #2: ◦ We need to implement 4 visitors to generate CSTs from Java ASTs • We implemented CSTs for three languages: ◦ Java, JavaScript, C (by a MSc student, in a course project) 44
  11. RefDiff 2.0 Architecture 45 CST Plug-in Program v1 Program v2

    CST v1 CST v2 RefDiff Refactorings in language X for language X
  12. RefDiff 2.0 Architecture 46 CST Plug-in Program v1 Program v2

    CST v1 CST v2 RefDiff Refactorings language agnostic "simple" to implement
  13. CST Nodes • Each CST node has ◦ ID ◦

    Name space ◦ Type: function, method, class, package, file etc ◦ Parameter list (optional) ◦ Tokens 47
  14. Pull Up Relationship 55 same type same IDs parent(n1) is

    a subtype of parent(n2) n 1 n 2 (n 1 )' before after
  15. Pull Up Relationship 56 same type same IDs parent(n1) is

    a subtype of parent(n2) n 1 n 2 (n 1 )' before after If the language does not have inheritance: subtype(c1,c2) = false, forall c1, c2
  16. Rename Relationship 59 same type same containers different names It's

    common to have renaming + edits tokens(n1) ≠ tokens(n2) We need to tolerate edits in code (n2)
  17. Code Similarity: Weighted Jaccard Coefficient where: - e i :

    tokens in a CST node - m i (t): number of tokens t in e i - idf: idf coefficient
  18. "this is only an engineering problem" is a false argument,

    at least when detecting refactorings 68
  19. We also need call graphs • To detect refactorings, it's

    also interesting to have a call graph • Call graph: node A calls node B • We also embedded a lightweight call graph in CSTs 72 call graph