Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using forensic techniques for targeted refactoring

Avatar for Riaan Riaan
March 08, 2016

Using forensic techniques for targeted refactoring

Avatar for Riaan

Riaan

March 08, 2016
Tweet

Other Decks in Programming

Transcript

  1. Who am I > More than a decade of software

    dev experience > Mobile app developer by day > Purveyor of strange topics by night > I’ve dabbled in AI, computer vision, robotics and even cooking > Please remember to rate my talk: http://www.devconf.co.za/rate
  2. The enemy of change > Complexity > If our job

    is to understand code, how do we make that job easier
  3. Tools I used > Git (specifically git log) > Code

    Maat > Python > D3.js (Javascript library)
  4. Forget the tools > It’s not about the tools, but

    rather the techniques > These tools simplify some parsing, processing or visualisation > You can write your own scripts for any of these functions
  5. Offender profiling > You probably know something about offender profiling.

    > Hollywood loves it: • Silence of the lambs • Numbers • Criminal minds • NCIS • Many more…
  6. Geographic profiling > Based in statistics and psychology. > Same

    principle as police officer sticking pins in a map
  7. Applying geographical profiling to code > What if a hotspot

    analysis could narrow down areas of bad code?
  8. Add a spatial component > Hopefully you all use a

    VCS. > We need to focus on areas with high developer activity
  9. Add a spatial component > git log --pretty=format:'[%h] %an %ad

    %s' --date=short --numstat > maat.bat -l git.log -c git -a revisions > metric_data.cvs
  10. Profiling your codebase > Choose a timespan for your analysis

    > Get frequency data > Add complexity data > Merge complexity and effort > Visualise this data
  11. Profiling your codebase > We’ll look at the hibernate ORM

    > git clone https://github.com/hibernate/hibernate-orm.git
  12. Profiling your codebase > Choosing a timeframe > Don’t look

    at the life of the project > What timeframe you use depend on your development methodology • Between releases • Over iterations • Around significant events (reorganisation of code or teams)
  13. Profiling your codebase > Generate a log: > git log

    --pretty=format:'[%h] %an %ad %s' --date=short –numstat -- before=2013-09-05 --after=2012-01-01 > hib_evo.log
  14. Profiling your codebase > A summary of the changes shows

    some interesting things: prompt> maat -l hib_evo.log -c git -a summary statistic,value number-of-commits,1346 number-of-entities,10193 number-of-entities-changed,18258 number-of-authors,89
  15. Profiling your codebase > Analyzing change frequencies: > maat -l

    hib_evo.log -c git -a revisions > hib_freqs.csv
  16. Profiling your codebase > Calculate complexity > Complexity by lines

    of code? > Bad metric, but no worse than others… > Cloc ./ --by-file –csv –quiet –report-file=hib_lines.csv
  17. Profiling your codebase > Combine complexity and effort: > python

    scripts/merge_comp_freqs.py hib_freqs.csv hib_lines.csv > module,revisions,code build.gradle,79,402 hibernate-core/.../persister/entity/AbstractEntityPersister.java,44,3983 hibernate-core/.../cfg/Configuration.java,40,2673 hibernate-core/.../internal/SessionImpl.java,39,2097 hibernate-core/.../internal/SessionFactoryImpl.java,34,1384 …
  18. Profiling your codebase > Now we can finally get to

    the fun part: Visualisation > I’m using a sample D3.js circle-packing algorithm > Due to security restrictions in modern browsers: > python -m SimpleHTTPServer 8888
  19. Measuring complexity > You’ve already seen how to analyze a

    single revision. Now we want to: 1. Take a range of revisions for a specific module. 2. Calculate the indentation complexity of the module as it occurred in each revision. 3. Output the results revision by revision for further analysis.
  20. Measuring complexity > python scripts/git_complexity_trend.py --start ccc087b --end 46c962e --file

    hibernate-core/src/main/java/org/hibernate/cfg/Configuration.java > rev, n, total, mean, sd e75b8a7, 3080, 7610, 2.47, 1.76 23a6280, 3092, 7649, 2.47, 1.76 8991100, 3100, 7658, 2.47, 1.76 8373871, 3101, 7658, 2.47, 1.76 …