$30 off During Our Annual Pro Sale. View Details »

Guide Refactorings With Behavioral Code Analysis

Guide Refactorings With Behavioral Code Analysis

Many codebases contain code that is overly complicated, hard to understand, and hence expensive to change. The pressure of new features and user needs makes it hard to stop and backtrack, and the longer we wait, the worse it's going to be. Mix in the people side with frequent organizational change and the siren song of a system rewrite becomes more and more attractive. It doesn't have to be that way, and in this presentation you'll see how easily obtained version-control data let us uncover the behavior and patterns of the development organization. These behavioral code analysis techniques provide a sweet spot to prioritize and guide refactorings. We cover refactoring techniques that reduce excess complexity, address hidden implicit dependencies, and discuss architectural restructuring that reduce inter-team coordination needs. Since behavioral code analysis also lets us consider the social side of code, such as refactoring modules that are under development by our peers, we explore novel patterns that help us limit risks and code conflicts. The specific examples are from real-world codebases like Android, the Linux Kernel, ASP.NET Core MVC, and more.

Adam Tornhill

May 10, 2018
Tweet

More Decks by Adam Tornhill

Other Decks in Programming

Transcript

  1. Guide Refactorings
    With Behavioral Code Analysis
    @AdamTornhill craft-conf.com

    View Slide

  2. Modify someone else’s
    C++ code
    The Human Potential
    Hard Impossible
    Land on the Moon
    Sequence the Human Genome
    Revive the Dinosaurs
    The Pyramids
    Fusion Power Understand Consciousness
    @AdamTornhill

    View Slide

  3. Lehman’s “Laws” of Software Evolution
    Continuing Change
    “a system must be continually adapted or it
    becomes progressively less satisfactory”
    @AdamTornhill
    Increasing Complexity
    “as a system evolves, its complexity increases unless
    work is done to maintain or reduce it”
    M. Lehman, “Programs, life cycles, and laws of software evolution”, 1980

    View Slide

  4. The Two Forms Of Accidental Complexity
    Complex Parts Complex Inter-Dependencies
    @AdamTornhill

    View Slide

  5. Behavioral Code Analysis - What Is It?
    Code: Important Evolution and Behavior: More Important
    @AdamTornhill

    View Slide

  6. Version-Control: A Behavioral Data Source
    Commit: b557ca5
    Date: 2016-02-12
    Author: Kevin Flynn
    Fix behavior of StartsWithPrefix
    8 27 src/Mvc.Abstractions/ModelBinding/ModelStateDictionary.cs
    1 10 src/Mvc.Core/ControllerBase.cs
    1 1 src/Mvc.Core/Internal/ElementalValueProvider.cs
    1 39 src/Mvc.Core/Internal/PrefixContainer.cs
    Commit: fd6d28d
    Date 2016-02-10
    Author: Professor Falken
    Make AddController not overwrite existing IControllerTypeProvider
    8 1 src/Core/Internal/ControllersAsServices.cs
    48 0 test/Core.Test/Internal/ControllerAsServicesTest.cs
    13 0 test/Mvc.FunctionalTests/ControllerFromServicesTests.cs
    Commit: 910f013
    Date :2016-02-05
    Author Lisbeth Salander
    Fixes #4050: Throw an exception when media types are empty.
    20 1 src/Mvc.Core/Formatters/InputFormatter.cs
    Social Information
    A Time Dimension
    Progress on Tasks
    Co-changing Files
    @AdamTornhill

    View Slide

  7. Prefer Simple Metrics
    Metrics are a Guide, not a Replacement for Expertise
    because
    @AdamTornhill

    View Slide

  8. Extract The Signal via a Human in the Loop
    View Complexity Through the Lens of Behavioral Data

    View Slide

  9. Case Study: Android
    The Platform Framework Base in Numbers
    3 Million Lines of Code
    2,1 Million Lines of Java
    2,000 Unique Authors
    @AdamTornhill

    View Slide

  10. Case Study: Android
    The Platform Framework Base in Numbers
    3 Million Lines of Code
    2,1 Million Lines of Java
    2,000 Unique Authors
    @AdamTornhill

    View Slide

  11. Case Study: Android
    Code Complexity
    Code Change Frequency
    Hotspot
    @AdamTornhill

    View Slide

  12. What we normally
    care about…
    A simpler view!
    Ref to Python script…
    What’s Code Complexity Anyway?
    Implementation: https://github.com/adamtornhill/maat-scripts/blob/master/miner/complexity_analysis.py
    @AdamTornhill

    View Slide

  13. Case Study: Android
    20,097 Lines of Code
    2,009 Commits

    View Slide

  14. Trade-Offs: Improvements vs New Features

    View Slide

  15. Programming As If Social Factors Mattered
    Author #1 Author #2 Author #N
    The Relative Contributions of Each Author
    contributors
    Fractal Value: M. D’Ambros, M. Lanza, and H Gall. Fractal Figures: Visualizing Development Effort for CVS Entities.

    View Slide

  16. The Splinter Pattern
    The splinter pattern provides a structured way to break up hotspots into
    manageable pieces that can be divided among several developers to work on,
    rather than having a group of developers work on one large piece of code.
    https://pragprog.com/book/atevol/software-design-x-rays

    View Slide

  17. ActivityStack.java
    Translucence.java
    DeviceOwnership.java
    ActivityLifecycle.java
    delegate
    delegate
    delegate
    Stack behavior
    Translucence 

    behavior
    Ownership 

    behavior
    Lifecycle behavior
    Splinter Context
    Original Context
    The Splinter Pattern
    Unmodified, copy-pasted
    (yes, really) content
    @AdamTornhill

    View Slide

  18. Splinter: Resulting Context
    ActivityStack.java
    Translucence.java
    DeviceOwnership.java
    ActivityLifecycle.java
    Better alignment between
    problem and solution domain

    => facilitates parallel work
    Individual Parts that can
    be refactored in isolation
    @AdamTornhill

    View Slide

  19. ActivityStack.java
    @AdamTornhill
    Cut The Middle Man

    View Slide

  20. Measure and Visualize Improvements
    The effects of a
    Splinter refactoring
    @AdamTornhill

    View Slide

  21. Methods and Functions: Where Do We Start?
    @AdamTornhill

    View Slide

  22. Function Level Hotspots
    Parse Recommended functions to improve.
    Hotspots: X-Ray ActivityManagerService.java
    @AdamTornhill

    View Slide

  23. X-Ray of ActivityManagerService.java
    @AdamTornhill

    View Slide

  24. https://codescene.io/
    Source Code: 

    https://github.com/adamtornhill/code-maat
    Tooling: Try it on your own Code
    Track functions with
    git log -L ::
    @AdamTornhill

    View Slide

  25. Code Duplication and DRY Violations
    5-20% of all Code is Duplicated to Some Extent
    @AdamTornhill

    View Slide

  26. @AdamTornhill
    DRY Violations in
    handleMessage (Andoid)

    View Slide

  27. @AdamTornhill
    DRY Violations in
    handleMessage (Andoid)
    Next refactoring step:
    Design Pattern COMMAND?

    View Slide

  28. @AdamTornhill
    The Dirty Secret of Copy Paste
    Image from https://en.wikipedia.org/wiki/Rorschach_test

    View Slide

  29. B()
    A()
    Change Coupling: Patterns That Emerge Over Time
    Commit #1 Commit #2
    E()
    Commit #3
    Changed code
    B()
    A()
    Changed code

    View Slide

  30. Case Study: Code Clones in Linux
    @AdamTornhill
    The Linux Kernel in Numbers
    16 Million Lines of Code
    15 Million Lines of C
    15,000 Unique Authors

    View Slide

  31. Inside the Main Hotspot: intel_display.c
    11,383 Lines of Code
    3,040 Commits

    View Slide

  32. Inside The Main Hotspot: intel_display.c
    11,383 Lines of Code
    3,040 Commits

    View Slide

  33. View Slide

  34. Inside the Main Hotspot: intel_display.c
    11,383 Lines of Code
    3,040 Commits

    View Slide

  35. A bug due to omission?
    A context-specific check?

    View Slide

  36. Combine copy-paste Detection techniques
    with change coupling to identify the code
    clones that really need refactoring.
    Clone Detector Applications
    Clone Digger (Java and Python):
    http://clonedigger.sourceforge.net/

    Simian (.NET):
    http://www.harukizaemon.com/simian/
    @AdamTornhill

    View Slide

  37. Duplication goes Beyond Code Similarity
    @AdamTornhill

    View Slide

  38. Case Study: Refactoring Entity Framework Core
    @AdamTornhill
    Entity Framework Core in Numbers
    574,000 Lines of Code
    365,000 Lines of C#
    102 Unique Authors

    View Slide

  39. Code Clones in Entity Framework - Refactor?

    View Slide

  40. The Principle of Proximity

    View Slide

  41. The Principle of Proximity in Code
    @AdamTornhill

    View Slide

  42. The Proximity Refactoring

    View Slide

  43. Image from https://thedailywtf.com/articles/comments/Enterprise-Dependency-Big-Ball-of-Yarn
    Reducing Complex Inter-Dependencies

    View Slide

  44. Image from https://thedailywtf.com/articles/comments/Enterprise-Dependency-Big-Ball-of-Yarn
    The Heuristic of Surprise

    View Slide

  45. Case Study: jUnit5
    @AdamTornhill
    jUnit5 in Numbers
    59,000 Lines of Code
    51,000 Lines of Java
    74 Unique Authors

    View Slide

  46. Unit tests that change together
    in ~70% of all commits
    @AdamTornhill

    View Slide

  47. X-Ray of the Coevolving jUnit Tests
    @AdamTornhill

    View Slide

  48. Let’s Look at the Code…
    ExceptionHandlingTests.java
    ReportingTests.java
    TestCaseWithInheritanceTests.java
    @AdamTornhill

    View Slide

  49. Refactoring jUnit: Express Our Domain

    View Slide

  50. Refactoring jUnit: Express Our Domain

    View Slide

  51. Express Tests in the Language of your Domain;
    Generic Assertions are at Odds with that Goal.
    @AdamTornhill

    View Slide

  52. The Costs of Implicit Dependencies
    Increase with Architectural Distance
    @AdamTornhill

    View Slide

  53. UI code implemented in JavaScript… …changed coupled to backend code
    implemented in Clojure (different repository)
    @AdamTornhill

    View Slide

  54. Git Repository
    Analysis Engine
    Backend
    UI
    Data Mining
    Git Repository
    Git Repository
    Change Coupling!

    View Slide

  55. How Can You Track Changes Across Repositories?
    @AdamTornhill
    e9e57e48 2017-05-26 D.Cooper [email protected]

    Bugfix: the owls are not what they seem.
    Task: CLJ-2141
    A specific commit
    Use a Task ID or Ticket Reference

    View Slide

  56. UI code implemented in JavaScript… …changed coupled to backend code
    implemented in Clojure (different repository)
    @AdamTornhill

    View Slide

  57. X-Ray Across Repository Boundaries
    JavaScript function…
    …that changes with Clojure business logic.
    @AdamTornhill

    View Slide

  58. Let Features Drive Architectural Building Blocks,
    not Technology.
    @AdamTornhill

    View Slide

  59. @AdamTornhill
    Blog on Behavioral Code Analysis
    http://www.empear.com/blog/
    Your Code As A Crime Scene
    https://pragprog.com/book/atcrime/your-code-as-a-crime-scene
    Software Design X-Rays
    https://pragprog.com/book/atevol/software-design-x-rays
    Test the Analyses in CodeScene:
    https://codescene.io/
    [email protected]

    View Slide