Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Guide Refactorings With Behavioral Code Analysis

Guide Refactorings With Behavioral Code Analysis

Many codebases contain code that is overly complicated, hard to understand, and hence expensive to change. The pressure of new features and user needs makes it hard to stop and backtrack, and the longer we wait, the worse it's going to be. Mix in the people side with frequent organizational change and the siren song of a system rewrite becomes more and more attractive. It doesn't have to be that way, and in this presentation you'll see how easily obtained version-control data let us uncover the behavior and patterns of the development organization. These behavioral code analysis techniques provide a sweet spot to prioritize and guide refactorings. We cover refactoring techniques that reduce excess complexity, address hidden implicit dependencies, and discuss architectural restructuring that reduce inter-team coordination needs. Since behavioral code analysis also lets us consider the social side of code, such as refactoring modules that are under development by our peers, we explore novel patterns that help us limit risks and code conflicts. The specific examples are from real-world codebases like Android, the Linux Kernel, ASP.NET Core MVC, and more.


Adam Tornhill

May 10, 2018


  1. Guide Refactorings With Behavioral Code Analysis @AdamTornhill craft-conf.com

  2. Modify someone else’s C++ code The Human Potential Hard Impossible

    Land on the Moon Sequence the Human Genome Revive the Dinosaurs The Pyramids Fusion Power Understand Consciousness @AdamTornhill
  3. Lehman’s “Laws” of Software Evolution Continuing Change “a system must

    be continually adapted or it becomes progressively less satisfactory” @AdamTornhill Increasing Complexity “as a system evolves, its complexity increases unless work is done to maintain or reduce it” M. Lehman, “Programs, life cycles, and laws of software evolution”, 1980
  4. The Two Forms Of Accidental Complexity Complex Parts Complex Inter-Dependencies

  5. Behavioral Code Analysis - What Is It? Code: Important Evolution

    and Behavior: More Important @AdamTornhill
  6. Version-Control: A Behavioral Data Source Commit: b557ca5 Date: 2016-02-12 Author:

    Kevin Flynn Fix behavior of StartsWithPrefix 8 27 src/Mvc.Abstractions/ModelBinding/ModelStateDictionary.cs 1 10 src/Mvc.Core/ControllerBase.cs 1 1 src/Mvc.Core/Internal/ElementalValueProvider.cs 1 39 src/Mvc.Core/Internal/PrefixContainer.cs Commit: fd6d28d Date 2016-02-10 Author: Professor Falken Make AddController not overwrite existing IControllerTypeProvider 8 1 src/Core/Internal/ControllersAsServices.cs 48 0 test/Core.Test/Internal/ControllerAsServicesTest.cs 13 0 test/Mvc.FunctionalTests/ControllerFromServicesTests.cs Commit: 910f013 Date :2016-02-05 Author Lisbeth Salander Fixes #4050: Throw an exception when media types are empty. 20 1 src/Mvc.Core/Formatters/InputFormatter.cs Social Information A Time Dimension Progress on Tasks Co-changing Files @AdamTornhill
  7. Prefer Simple Metrics Metrics are a Guide, not a Replacement

    for Expertise because @AdamTornhill
  8. Extract The Signal via a Human in the Loop View

    Complexity Through the Lens of Behavioral Data
  9. Case Study: Android The Platform Framework Base in Numbers 3

    Million Lines of Code 2,1 Million Lines of Java 2,000 Unique Authors @AdamTornhill
  10. Case Study: Android The Platform Framework Base in Numbers 3

    Million Lines of Code 2,1 Million Lines of Java 2,000 Unique Authors @AdamTornhill
  11. Case Study: Android Code Complexity Code Change Frequency Hotspot @AdamTornhill

  12. What we normally care about… A simpler view! Ref to

    Python script… What’s Code Complexity Anyway? Implementation: https://github.com/adamtornhill/maat-scripts/blob/master/miner/complexity_analysis.py @AdamTornhill
  13. Case Study: Android 20,097 Lines of Code 2,009 Commits

  14. Trade-Offs: Improvements vs New Features

  15. Programming As If Social Factors Mattered Author #1 Author #2

    Author #N The Relative Contributions of Each Author contributors Fractal Value: M. D’Ambros, M. Lanza, and H Gall. Fractal Figures: Visualizing Development Effort for CVS Entities.
  16. The Splinter Pattern The splinter pattern provides a structured way

    to break up hotspots into manageable pieces that can be divided among several developers to work on, rather than having a group of developers work on one large piece of code. https://pragprog.com/book/atevol/software-design-x-rays
  17. ActivityStack.java Translucence.java DeviceOwnership.java ActivityLifecycle.java delegate delegate delegate Stack behavior Translucence

 behavior Ownership 
 behavior Lifecycle behavior Splinter Context Original Context The Splinter Pattern Unmodified, copy-pasted (yes, really) content @AdamTornhill
  18. Splinter: Resulting Context ActivityStack.java Translucence.java DeviceOwnership.java ActivityLifecycle.java Better alignment between

    problem and solution domain 
 => facilitates parallel work Individual Parts that can be refactored in isolation @AdamTornhill
  19. ActivityStack.java @AdamTornhill Cut The Middle Man

  20. Measure and Visualize Improvements The effects of a Splinter refactoring

  21. Methods and Functions: Where Do We Start? @AdamTornhill

  22. Function Level Hotspots Parse Recommended functions to improve. Hotspots: X-Ray

    ActivityManagerService.java @AdamTornhill
  23. X-Ray of ActivityManagerService.java @AdamTornhill

  24. https://codescene.io/ Source Code: 
 https://github.com/adamtornhill/code-maat Tooling: Try it on your

    own Code Track functions with git log -L :<funcname>:<file> @AdamTornhill
  25. Code Duplication and DRY Violations 5-20% of all Code is

    Duplicated to Some Extent @AdamTornhill
  26. @AdamTornhill DRY Violations in handleMessage (Andoid)

  27. @AdamTornhill DRY Violations in handleMessage (Andoid) Next refactoring step: Design

    Pattern COMMAND?
  28. @AdamTornhill The Dirty Secret of Copy Paste Image from https://en.wikipedia.org/wiki/Rorschach_test

  29. B() A() Change Coupling: Patterns That Emerge Over Time Commit

    #1 Commit #2 E() Commit #3 Changed code B() A() Changed code
  30. Case Study: Code Clones in Linux @AdamTornhill The Linux Kernel

    in Numbers 16 Million Lines of Code 15 Million Lines of C 15,000 Unique Authors
  31. Inside the Main Hotspot: intel_display.c 11,383 Lines of Code 3,040

  32. Inside The Main Hotspot: intel_display.c 11,383 Lines of Code 3,040

  33. None
  34. Inside the Main Hotspot: intel_display.c 11,383 Lines of Code 3,040

  35. A bug due to omission? A context-specific check?

  36. Combine copy-paste Detection techniques with change coupling to identify the

    code clones that really need refactoring. Clone Detector Applications Clone Digger (Java and Python): http://clonedigger.sourceforge.net/
 Simian (.NET): http://www.harukizaemon.com/simian/ @AdamTornhill
  37. Duplication goes Beyond Code Similarity @AdamTornhill

  38. Case Study: Refactoring Entity Framework Core @AdamTornhill Entity Framework Core

    in Numbers 574,000 Lines of Code 365,000 Lines of C# 102 Unique Authors
  39. Code Clones in Entity Framework - Refactor?

  40. The Principle of Proximity

  41. The Principle of Proximity in Code @AdamTornhill

  42. The Proximity Refactoring

  43. Image from https://thedailywtf.com/articles/comments/Enterprise-Dependency-Big-Ball-of-Yarn Reducing Complex Inter-Dependencies

  44. Image from https://thedailywtf.com/articles/comments/Enterprise-Dependency-Big-Ball-of-Yarn The Heuristic of Surprise

  45. Case Study: jUnit5 @AdamTornhill jUnit5 in Numbers 59,000 Lines of

    Code 51,000 Lines of Java 74 Unique Authors
  46. Unit tests that change together in ~70% of all commits

  47. X-Ray of the Coevolving jUnit Tests @AdamTornhill

  48. Let’s Look at the Code… ExceptionHandlingTests.java ReportingTests.java TestCaseWithInheritanceTests.java @AdamTornhill

  49. Refactoring jUnit: Express Our Domain

  50. Refactoring jUnit: Express Our Domain

  51. Express Tests in the Language of your Domain; Generic Assertions

    are at Odds with that Goal. @AdamTornhill
  52. The Costs of Implicit Dependencies Increase with Architectural Distance @AdamTornhill

  53. UI code implemented in JavaScript… …changed coupled to backend code

    implemented in Clojure (different repository) @AdamTornhill
  54. Git Repository Analysis Engine Backend UI Data Mining Git Repository

    Git Repository Change Coupling!
  55. How Can You Track Changes Across Repositories? @AdamTornhill e9e57e48 2017-05-26

    D.Cooper cooper@unknown.org 
 Bugfix: the owls are not what they seem. Task: CLJ-2141 A specific commit Use a Task ID or Ticket Reference
  56. UI code implemented in JavaScript… …changed coupled to backend code

    implemented in Clojure (different repository) @AdamTornhill
  57. X-Ray Across Repository Boundaries JavaScript function… …that changes with Clojure

    business logic. @AdamTornhill
  58. Let Features Drive Architectural Building Blocks, not Technology. @AdamTornhill

  59. @AdamTornhill Blog on Behavioral Code Analysis http://www.empear.com/blog/ Your Code As

    A Crime Scene https://pragprog.com/book/atcrime/your-code-as-a-crime-scene Software Design X-Rays https://pragprog.com/book/atevol/software-design-x-rays Test the Analyses in CodeScene: https://codescene.io/ adam.tornhill@empear.com