Should I Bug You? Identifying Domain Experts in Software Projects Using Code Complexity Metrics

Hasso Plattner Institute University of Potsdam, Germany [email protected] @chrisma0 Should
I Bug You? Identifying Domain Experts in Software Projects Using Code Complexity Metrics Ralf Teusner, Christoph Matthies, Philipp Giese QRS’17, Prague, July 2017

“ Background Truck Factor The number of people on your
team who have to be hit with a truck before the project is in serious trouble. [1] 2 [1] Michael Bowler. “Truck Factor”. May 15, 2005. http://www.agileadvice.com/2005/05/15/agilemanagement/truck-factor/ ▪ Any system develops domain experts over time ▪ High Truck Number → domain “gurus” ▪ Low collective code ownership ▪ Can lead to Conway’s Law ” Motivation

Research Question ▪ Who should I ask when I’m in
need of assistance? ▪ Who is most qualified to write the documentation? ▪ Who is most qualified to review this piece of code? ▪ In which areas can knowledge sharing be improved? 3 The knowledge we seek Who is the domain expert for which part of the software?

Challenges & Goals ▪ Developers are busy ▪ Project documentation
is likely out of date ▪ Avoid overhead of documenting domain expertise ▪ Idea: Use already existing artifacts, i.e. code ▪ Analyze code to attribute expertise to developers 4 How can we find the gurus, without “bugging” them

Code Analysis ▪ Apply proven complexity metrics to code ▪
Case-by-case basis, no set of metrics can fit all contexts ▪ Consider knowledge of metrics within a software team ▪ In this case study ▪ Lines of code ▪ Efferent coupling (Fan-Out) & afferent coupling (Fan-in) ▪ Cyclomatic Complexity ▪ Halstead difficulty & volume 5 From Code to Domain Expertise

Cyclomatic Complexity Measurement of the number of linearly independent paths
through a program's source code CC = #Edges − #Nodes + 2*#Components Example: 9 edges, 8 nodes, 1 connected component. Cyclomatic complexity: 9 - 8 + 2*1 = 3 6 aka McCabe Complexity [2] Start End While B A If C D [2] T. J. McCabe. “A complexity measure,” IEEE Transactions on Software Engineering, no. 4. pp. 308–320. 1976.

▪ η 1 , N 1 number of distinct and
total operators ▪ η 2 , N 2 number of distinct and total operands ▪ Volume = (η:= vocabulary size) ▪ Difficulty = Halstead Difficulty & Volume Idea: complexity based on numbers of operators (e.g. reserved words) and operands (e.g. variables) 7 A subset of Halstead complexity measures [3] [3] Halstead, Maurice H. “Elements of Software Science”. Amsterdam: Elsevier North-Holland, Inc. 1977. ISBN 0-444-00205-7.

Fan-In & Fan-Out Metrics 8 aka Efferent & Afferent Coupling
[4] Code element Code element Afferent Coupling (Ca) Efferent Coupling (Ce) Number of elements that a code element depends upon Number of elements that depend on a code element [4] S. Henry and D. Kafura. “Software structure metrics based on information flow”. IEEE Transactions on Software Engineering, no. 5. pp. 510–518. 1981.

Metrics in a Real Project 9 Changes in complexity measures
related to real-world project events 2011 “Start-Up” phase Maintenance phase

Analyzr Framework 10 Analyzing every commit of a project

Analyzr Framework 11 Details for a single commit Cyclomatic Complexity
Halstead Volume Halstead Difficulty Fan-In Fan-Out Source Lines of Code https://github.com/firebug/firebug/commit/076da997e6bc0cb14b27afcc2d845c730de52fcf

Analyzr Architecture 12 Under the Hood Git & SVN JHawk
(Java) & Complexity Report (JS) Python & Django

Metric Aggregation ▪ Squale: bounded, continuous scale for comparison of
metric values [5] ▪ Combines low-level marks (raw metric values) into individual marks (IM) ▪ IM mapped to unified scale (from 0 to 3), determined by experts [6] ▪ IM then aggregated (weighted) to form global mark 13 The Software Quality Enhancement (Squale) Model [5] Mordal-Manet et al. "The squale model—A practice-based industrial quality model." IEEE International Conference on Software Maintenance. 2009. [6] Balmas et al. “Software metric for Java and C++ practices“. Research Report. pp.44. 2010.

Expertise Extraction ▪ Determine changes in code metrics, i.e. deltas,
for each developer over time ▪ Identify influence of commit on component’s global mark ▪ Expertise: ratio of commits that increase / decrease marks (quality impact, qi) smoothed by total author commits ( ) 14 From metrics to knowledge about developers Expertise(a)

Evaluation ▪ Two surveys performed for evaluation ▪ Expert identification
— who is currently being asked ▪ without knowledge of Analyzr results ▪ Proposal evaluation — who should be asked ▪ with knowledge of results ▪ Bounded time frame of observation ▪ Distinguish temporary and permanent leave ▪ In this study: 62 days 15 Assessing the quality of results with surveys

Expert Identification Survey ▪ Task: Identify top 3 domain experts
for front and back end components, prior to tool introduction ▪ Total agreement between participants on top expert ▪ In front end components: 55% ▪ In back end components: 33.3% ▪ Majority agreed on top expert in 88% of total choices → Developers have a specific component expert in mind 16 Who is currently being identified as domain expert

Expert Identification Survey ▪ Accuracy of Analyzr predictions for first
choice of domain expert vs intuitive developer picks ▪ Front and back end combined: 47.37% match ▪ Back end: 71.43% match ▪ Front end: 50% miss 17 Comparing Analyzr predictions to intuitive survey data

Analyzr Proposal Evaluation ▪ Developers asked to rate first, second,
third choice of component expert suggested ▪ Scale: strong disagree (0), disagree (1), agree (2), strong agree (3) ▪ Back end: 100% agreement (87.5% strong agree) ▪ Front end: 90% agreement (48.5% strong agree) 18 Survey on what developers think of suggestions

Summary & Conclusion ▪ Feasibility of identifying experts using code
complexity ▪ Algorithmically identified experts differed from intuitive selections ▪ Algorithmically identified experts rated as accurate in 90% of cases → Evidence for non-obvious component experts, i.e. “hidden experts” → Asking for the “guru” might not be ideal, might simply get you the default person 19 Take-away messages [email protected] @chrisma0

Image Credits 20 In order of appearance ▪ Recruitment by
Gerald Wildmoser from the Noun Project (CC BY 3.0 US) ▪ Truck by Mello from the Noun Project (CC BY 3.0 US) ▪ questions by Gregor Cresnar from the Noun Project (CC BY 3.0 US) ▪ Target by Arthur Shlain from the Noun Project (CC BY 3.0 US) ▪ Search Code by icon 54 from the Noun Project (CC BY 3.0 US) ▪ Difficulty Gauge by Thanh Nguyen from the Noun Project (CC BY 3.0 US) ▪ puzzles by Kirby Wu from the Noun Project (CC BY 3.0 US) ▪ clipboard by David from the Noun Project (CC BY 3.0 US) ▪ Seo expert by H Alberto Gongora from the Noun Project (CC BY 3.0 US) ▪ Idea by Gilbert Bages from the Noun Project (CC BY 3.0 US)

Should I Bug You? Identifying Domain Experts in...

Should I Bug You? Identifying Domain Experts in Software Projects Using Code Complexity Metrics

Christoph Matthies

More Decks by Christoph Matthies

Other Decks in Research

Featured

Transcript

Hasso Plattner Institute University of Potsdam, Germany [email protected] @chrisma0 Should

“ Background Truck Factor The number of people on your

Research Question ▪ Who should I ask when I’m in

Challenges & Goals ▪ Developers are busy ▪ Project documentation

Code Analysis ▪ Apply proven complexity metrics to code ▪

Cyclomatic Complexity Measurement of the number of linearly independent paths

▪ η 1 , N 1 number of distinct and

Fan-In & Fan-Out Metrics 8 aka Efferent & Afferent Coupling

Metrics in a Real Project 9 Changes in complexity measures

Analyzr Framework 10 Analyzing every commit of a project

Analyzr Framework 11 Details for a single commit Cyclomatic Complexity

Analyzr Architecture 12 Under the Hood Git & SVN JHawk

Metric Aggregation ▪ Squale: bounded, continuous scale for comparison of

Expertise Extraction ▪ Determine changes in code metrics, i.e. deltas,

Evaluation ▪ Two surveys performed for evaluation ▪ Expert identification

Expert Identification Survey ▪ Task: Identify top 3 domain experts

Expert Identification Survey ▪ Accuracy of Analyzr predictions for first

Analyzr Proposal Evaluation ▪ Developers asked to rate first, second,

Summary & Conclusion ▪ Feasibility of identifying experts using code

Image Credits 20 In order of appearance ▪ Recruitment by