Should I Bug You? Identifying Domain Experts in Software Projects Using Code Complexity Metrics

Should I Bug You? Identifying Domain Experts in Software Projects Using Code Complexity Metrics

Any sufficiently complex software system has experts, who have a deeper understanding of parts of the system than others.
However, it is not always clear who these experts are and which particular parts of the system they can provide help with.

We propose a framework to elicit the expertise of developers and recommend experts by analyzing the development of code complexity measures over time, by author as well as on the component level.
Teams can use this approach to detect those parts of the software for which currently no, or only few experts exist and can take preventive actions to keep the collective code knowledge and ownership high.

We employed the developed approach at a medium-sized company.
The results were evaluated with a survey, comparing the perceived and the computed expertise of developers.
We show that aggregated code metrics can be used to identify experts for different software components.
The identified experts were rated as acceptable candidates by developers in over 90% of all cases.


Christoph Matthies

July 28, 2017


  1. Hasso Plattner Institute University of Potsdam, Germany @chrisma0 Should

    I Bug You? Identifying Domain Experts in Software Projects Using Code Complexity Metrics Ralf Teusner, Christoph Matthies, Philipp Giese QRS’17, Prague, July 2017
  2. “ Background Truck Factor The number of people on your

    team who have to be hit with a truck before the project is in serious trouble. [1] 2 [1] Michael Bowler. “Truck Factor”. May 15, 2005. ▪ Any system develops domain experts over time ▪ High Truck Number → domain “gurus” ▪ Low collective code ownership ▪ Can lead to Conway’s Law ” Motivation
  3. Research Question ▪ Who should I ask when I’m in

    need of assistance? ▪ Who is most qualified to write the documentation? ▪ Who is most qualified to review this piece of code? ▪ In which areas can knowledge sharing be improved? 3 The knowledge we seek Who is the domain expert for which part of the software?
  4. Challenges & Goals ▪ Developers are busy ▪ Project documentation

    is likely out of date ▪ Avoid overhead of documenting domain expertise ▪ Idea: Use already existing artifacts, i.e. code ▪ Analyze code to attribute expertise to developers 4 How can we find the gurus, without “bugging” them
  5. Code Analysis ▪ Apply proven complexity metrics to code ▪

    Case-by-case basis, no set of metrics can fit all contexts ▪ Consider knowledge of metrics within a software team ▪ In this case study ▪ Lines of code ▪ Efferent coupling (Fan-Out) & afferent coupling (Fan-in) ▪ Cyclomatic Complexity ▪ Halstead difficulty & volume 5 From Code to Domain Expertise
  6. Cyclomatic Complexity Measurement of the number of linearly independent paths

    through a program's source code CC = #Edges − #Nodes + 2*#Components Example: 9 edges, 8 nodes, 1 connected component. Cyclomatic complexity: 9 - 8 + 2*1 = 3 6 aka McCabe Complexity [2] Start End While B A If C D [2] T. J. McCabe. “A complexity measure,” IEEE Transactions on Software Engineering, no. 4. pp. 308–320. 1976.
  7. ▪ η 1 , N 1 number of distinct and

    total operators ▪ η 2 , N 2 number of distinct and total operands ▪ Volume = (η:= vocabulary size) ▪ Difficulty = Halstead Difficulty & Volume Idea: complexity based on numbers of operators (e.g. reserved words) and operands (e.g. variables) 7 A subset of Halstead complexity measures [3] [3] Halstead, Maurice H. “Elements of Software Science”. Amsterdam: Elsevier North-Holland, Inc. 1977. ISBN 0-444-00205-7.
  8. Fan-In & Fan-Out Metrics 8 aka Efferent & Afferent Coupling

    [4] Code element Code element Afferent Coupling (Ca) Efferent Coupling (Ce) Number of elements that a code element depends upon Number of elements that depend on a code element [4] S. Henry and D. Kafura. “Software structure metrics based on information flow”. IEEE Transactions on Software Engineering, no. 5. pp. 510–518. 1981.
  9. Metrics in a Real Project 9 Changes in complexity measures

    related to real-world project events 2011 “Start-Up” phase Maintenance phase
  10. Analyzr Framework 10 Analyzing every commit of a project

  11. Analyzr Framework 11 Details for a single commit Cyclomatic Complexity

    Halstead Volume Halstead Difficulty Fan-In Fan-Out Source Lines of Code
  12. Analyzr Architecture 12 Under the Hood Git & SVN JHawk

    (Java) & Complexity Report (JS) Python & Django
  13. Metric Aggregation ▪ Squale: bounded, continuous scale for comparison of

    metric values [5] ▪ Combines low-level marks (raw metric values) into individual marks (IM) ▪ IM mapped to unified scale (from 0 to 3), determined by experts [6] ▪ IM then aggregated (weighted) to form global mark 13 The Software Quality Enhancement (Squale) Model [5] Mordal-Manet et al. "The squale model—A practice-based industrial quality model." IEEE International Conference on Software Maintenance. 2009. [6] Balmas et al. “Software metric for Java and C++ practices“. Research Report. pp.44. 2010.
  14. Expertise Extraction ▪ Determine changes in code metrics, i.e. deltas,

    for each developer over time ▪ Identify influence of commit on component’s global mark ▪ Expertise: ratio of commits that increase / decrease marks (quality impact, qi) smoothed by total author commits ( ) 14 From metrics to knowledge about developers Expertise(a)
  15. Evaluation ▪ Two surveys performed for evaluation ▪ Expert identification

    — who is currently being asked ▪ without knowledge of Analyzr results ▪ Proposal evaluation — who should be asked ▪ with knowledge of results ▪ Bounded time frame of observation ▪ Distinguish temporary and permanent leave ▪ In this study: 62 days 15 Assessing the quality of results with surveys
  16. Expert Identification Survey ▪ Task: Identify top 3 domain experts

    for front and back end components, prior to tool introduction ▪ Total agreement between participants on top expert ▪ In front end components: 55% ▪ In back end components: 33.3% ▪ Majority agreed on top expert in 88% of total choices → Developers have a specific component expert in mind 16 Who is currently being identified as domain expert
  17. Expert Identification Survey ▪ Accuracy of Analyzr predictions for first

    choice of domain expert vs intuitive developer picks ▪ Front and back end combined: 47.37% match ▪ Back end: 71.43% match ▪ Front end: 50% miss 17 Comparing Analyzr predictions to intuitive survey data
  18. Analyzr Proposal Evaluation ▪ Developers asked to rate first, second,

    third choice of component expert suggested ▪ Scale: strong disagree (0), disagree (1), agree (2), strong agree (3) ▪ Back end: 100% agreement (87.5% strong agree) ▪ Front end: 90% agreement (48.5% strong agree) 18 Survey on what developers think of suggestions
  19. Summary & Conclusion ▪ Feasibility of identifying experts using code

    complexity ▪ Algorithmically identified experts differed from intuitive selections ▪ Algorithmically identified experts rated as accurate in 90% of cases → Evidence for non-obvious component experts, i.e. “hidden experts” → Asking for the “guru” might not be ideal, might simply get you the default person 19 Take-away messages @chrisma0
  20. Image Credits 20 In order of appearance ▪ Recruitment by

    Gerald Wildmoser from the Noun Project (CC BY 3.0 US) ▪ Truck by Mello from the Noun Project (CC BY 3.0 US) ▪ questions by Gregor Cresnar from the Noun Project (CC BY 3.0 US) ▪ Target by Arthur Shlain from the Noun Project (CC BY 3.0 US) ▪ Search Code by icon 54 from the Noun Project (CC BY 3.0 US) ▪ Difficulty Gauge by Thanh Nguyen from the Noun Project (CC BY 3.0 US) ▪ puzzles by Kirby Wu from the Noun Project (CC BY 3.0 US) ▪ clipboard by David from the Noun Project (CC BY 3.0 US) ▪ Seo expert by H Alberto Gongora from the Noun Project (CC BY 3.0 US) ▪ Idea by Gilbert Bages from the Noun Project (CC BY 3.0 US)