Slide 1

Slide 1 text

Hasso Plattner Institute University of Potsdam, Germany [email protected] @chrisma0 Should I Bug You? Identifying Domain Experts in Software Projects Using Code Complexity Metrics Ralf Teusner, Christoph Matthies, Philipp Giese QRS’17, Prague, July 2017

Slide 2

Slide 2 text

“ Background Truck Factor The number of people on your team who have to be hit with a truck before the project is in serious trouble. [1] 2 [1] Michael Bowler. “Truck Factor”. May 15, 2005. http://www.agileadvice.com/2005/05/15/agilemanagement/truck-factor/ ■ Any system develops domain experts over time ■ High Truck Number → domain “gurus” ■ Low collective code ownership ■ Can lead to Conway’s Law ” Motivation

Slide 3

Slide 3 text

Research Question ■ Who should I ask when I’m in need of assistance? ■ Who is most qualified to write the documentation? ■ Who is most qualified to review this piece of code? ■ In which areas can knowledge sharing be improved? 3 The knowledge we seek Who is the domain expert for which part of the software?

Slide 4

Slide 4 text

Challenges & Goals ■ Developers are busy ■ Project documentation is likely out of date ■ Avoid overhead of documenting domain expertise ■ Idea: Use already existing artifacts, i.e. code ■ Analyze code to attribute expertise to developers 4 How can we find the gurus, without “bugging” them

Slide 5

Slide 5 text

Code Analysis ■ Apply proven complexity metrics to code ■ Case-by-case basis, no set of metrics can fit all contexts ■ Consider knowledge of metrics within a software team ■ In this case study ■ Lines of code ■ Efferent coupling (Fan-Out) & afferent coupling (Fan-in) ■ Cyclomatic Complexity ■ Halstead difficulty & volume 5 From Code to Domain Expertise

Slide 6

Slide 6 text

Cyclomatic Complexity Measurement of the number of linearly independent paths through a program's source code CC = #Edges − #Nodes + 2*#Components Example: 9 edges, 8 nodes, 1 connected component. Cyclomatic complexity: 9 - 8 + 2*1 = 3 6 aka McCabe Complexity [2] Start End While B A If C D [2] T. J. McCabe. “A complexity measure,” IEEE Transactions on Software Engineering, no. 4. pp. 308–320. 1976.

Slide 7

Slide 7 text

■ η 1 , N 1 number of distinct and total operators ■ η 2 , N 2 number of distinct and total operands ■ Volume = (η:= vocabulary size) ■ Difficulty = Halstead Difficulty & Volume Idea: complexity based on numbers of operators (e.g. reserved words) and operands (e.g. variables) 7 A subset of Halstead complexity measures [3] [3] Halstead, Maurice H. “Elements of Software Science”. Amsterdam: Elsevier North-Holland, Inc. 1977. ISBN 0-444-00205-7.

Slide 8

Slide 8 text

Fan-In & Fan-Out Metrics 8 aka Efferent & Afferent Coupling [4] Code element Code element Afferent Coupling (Ca) Efferent Coupling (Ce) Number of elements that a code element depends upon Number of elements that depend on a code element [4] S. Henry and D. Kafura. “Software structure metrics based on information flow”. IEEE Transactions on Software Engineering, no. 5. pp. 510–518. 1981.

Slide 9

Slide 9 text

Metrics in a Real Project 9 Changes in complexity measures related to real-world project events 2011 “Start-Up” phase Maintenance phase

Slide 10

Slide 10 text

Analyzr Framework 10 Analyzing every commit of a project

Slide 11

Slide 11 text

Analyzr Framework 11 Details for a single commit Cyclomatic Complexity Halstead Volume Halstead Difficulty Fan-In Fan-Out Source Lines of Code https://github.com/firebug/firebug/commit/076da997e6bc0cb14b27afcc2d845c730de52fcf

Slide 12

Slide 12 text

Analyzr Architecture 12 Under the Hood Git & SVN JHawk (Java) & Complexity Report (JS) Python & Django

Slide 13

Slide 13 text

Metric Aggregation ■ Squale: bounded, continuous scale for comparison of metric values [5] ■ Combines low-level marks (raw metric values) into individual marks (IM) ■ IM mapped to unified scale (from 0 to 3), determined by experts [6] ■ IM then aggregated (weighted) to form global mark 13 The Software Quality Enhancement (Squale) Model [5] Mordal-Manet et al. "The squale model—A practice-based industrial quality model." IEEE International Conference on Software Maintenance. 2009. [6] Balmas et al. “Software metric for Java and C++ practices“. Research Report. pp.44. 2010.

Slide 14

Slide 14 text

Expertise Extraction ■ Determine changes in code metrics, i.e. deltas, for each developer over time ■ Identify influence of commit on component’s global mark ■ Expertise: ratio of commits that increase / decrease marks (quality impact, qi) smoothed by total author commits ( ) 14 From metrics to knowledge about developers Expertise(a)

Slide 15

Slide 15 text

Evaluation ■ Two surveys performed for evaluation ■ Expert identification — who is currently being asked ■ without knowledge of Analyzr results ■ Proposal evaluation — who should be asked ■ with knowledge of results ■ Bounded time frame of observation ■ Distinguish temporary and permanent leave ■ In this study: 62 days 15 Assessing the quality of results with surveys

Slide 16

Slide 16 text

Expert Identification Survey ■ Task: Identify top 3 domain experts for front and back end components, prior to tool introduction ■ Total agreement between participants on top expert ■ In front end components: 55% ■ In back end components: 33.3% ■ Majority agreed on top expert in 88% of total choices → Developers have a specific component expert in mind 16 Who is currently being identified as domain expert

Slide 17

Slide 17 text

Expert Identification Survey ■ Accuracy of Analyzr predictions for first choice of domain expert vs intuitive developer picks ■ Front and back end combined: 47.37% match ■ Back end: 71.43% match ■ Front end: 50% miss 17 Comparing Analyzr predictions to intuitive survey data

Slide 18

Slide 18 text

Analyzr Proposal Evaluation ■ Developers asked to rate first, second, third choice of component expert suggested ■ Scale: strong disagree (0), disagree (1), agree (2), strong agree (3) ■ Back end: 100% agreement (87.5% strong agree) ■ Front end: 90% agreement (48.5% strong agree) 18 Survey on what developers think of suggestions

Slide 19

Slide 19 text

Summary & Conclusion ■ Feasibility of identifying experts using code complexity ■ Algorithmically identified experts differed from intuitive selections ■ Algorithmically identified experts rated as accurate in 90% of cases → Evidence for non-obvious component experts, i.e. “hidden experts” → Asking for the “guru” might not be ideal, might simply get you the default person 19 Take-away messages [email protected] @chrisma0

Slide 20

Slide 20 text

Image Credits 20 In order of appearance ■ Recruitment by Gerald Wildmoser from the Noun Project (CC BY 3.0 US) ■ Truck by Mello from the Noun Project (CC BY 3.0 US) ■ questions by Gregor Cresnar from the Noun Project (CC BY 3.0 US) ■ Target by Arthur Shlain from the Noun Project (CC BY 3.0 US) ■ Search Code by icon 54 from the Noun Project (CC BY 3.0 US) ■ Difficulty Gauge by Thanh Nguyen from the Noun Project (CC BY 3.0 US) ■ puzzles by Kirby Wu from the Noun Project (CC BY 3.0 US) ■ clipboard by David from the Noun Project (CC BY 3.0 US) ■ Seo expert by H Alberto Gongora from the Noun Project (CC BY 3.0 US) ■ Idea by Gilbert Bages from the Noun Project (CC BY 3.0 US)