Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Should I Bug You? Identifying Domain Experts in Software Projects Using Code Complexity Metrics

Should I Bug You? Identifying Domain Experts in Software Projects Using Code Complexity Metrics

Any sufficiently complex software system has experts, who have a deeper understanding of parts of the system than others.
However, it is not always clear who these experts are and which particular parts of the system they can provide help with.

We propose a framework to elicit the expertise of developers and recommend experts by analyzing the development of code complexity measures over time, by author as well as on the component level.
Teams can use this approach to detect those parts of the software for which currently no, or only few experts exist and can take preventive actions to keep the collective code knowledge and ownership high.

We employed the developed approach at a medium-sized company.
The results were evaluated with a survey, comparing the perceived and the computed expertise of developers.
We show that aggregated code metrics can be used to identify experts for different software components.
The identified experts were rated as acceptable candidates by developers in over 90% of all cases.

Christoph Matthies

July 28, 2017
Tweet

More Decks by Christoph Matthies

Other Decks in Research

Transcript

  1. Hasso Plattner Institute
    University of Potsdam, Germany
    [email protected]
    @chrisma0
    Should I Bug You?
    Identifying Domain Experts in Software Projects
    Using Code Complexity Metrics
    Ralf Teusner, Christoph Matthies, Philipp Giese
    QRS’17, Prague, July 2017

    View Slide


  2. Background
    Truck Factor
    The number of people on your team who have to
    be hit with a truck before the project is in serious trouble. [1]
    2
    [1] Michael Bowler. “Truck Factor”. May 15, 2005.
    http://www.agileadvice.com/2005/05/15/agilemanagement/truck-factor/
    ■ Any system develops domain experts over time
    ■ High Truck Number → domain “gurus”
    ■ Low collective code ownership
    ■ Can lead to Conway’s Law

    Motivation

    View Slide

  3. Research Question
    ■ Who should I ask when I’m in need of assistance?
    ■ Who is most qualified to write the documentation?
    ■ Who is most qualified to review this piece of code?
    ■ In which areas can knowledge sharing be improved?
    3
    The knowledge we seek
    Who is the domain expert
    for which part of the software?

    View Slide

  4. Challenges & Goals
    ■ Developers are busy
    ■ Project documentation is likely out of date
    ■ Avoid overhead of documenting domain expertise
    ■ Idea: Use already existing artifacts, i.e. code
    ■ Analyze code to attribute expertise to developers
    4
    How can we find the gurus, without “bugging” them

    View Slide

  5. Code Analysis
    ■ Apply proven complexity metrics to code
    ■ Case-by-case basis, no set of metrics can fit all contexts
    ■ Consider knowledge of metrics within a software team
    ■ In this case study
    ■ Lines of code
    ■ Efferent coupling (Fan-Out) &
    afferent coupling (Fan-in)
    ■ Cyclomatic Complexity
    ■ Halstead difficulty & volume
    5
    From Code to Domain Expertise

    View Slide

  6. Cyclomatic Complexity
    Measurement of the number of
    linearly independent paths through
    a program's source code
    CC = #Edges − #Nodes + 2*#Components
    Example:
    9 edges, 8 nodes, 1 connected component.
    Cyclomatic complexity: 9 - 8 + 2*1 = 3
    6
    aka McCabe Complexity [2]
    Start
    End
    While
    B
    A
    If
    C
    D
    [2] T. J. McCabe. “A complexity measure,” IEEE Transactions on
    Software Engineering, no. 4. pp. 308–320. 1976.

    View Slide

  7. ■ η
    1
    , N
    1
    number of distinct and total operators
    ■ η
    2
    , N
    2
    number of distinct and total operands
    ■ Volume = (η:= vocabulary size)
    ■ Difficulty =
    Halstead Difficulty & Volume
    Idea: complexity based on numbers of operators
    (e.g. reserved words) and operands (e.g. variables)
    7
    A subset of Halstead complexity measures [3]
    [3] Halstead, Maurice H. “Elements of Software Science”. Amsterdam:
    Elsevier North-Holland, Inc. 1977. ISBN 0-444-00205-7.

    View Slide

  8. Fan-In & Fan-Out Metrics
    8
    aka Efferent & Afferent Coupling [4]
    Code element
    Code element
    Afferent
    Coupling (Ca)
    Efferent
    Coupling (Ce)
    Number of elements that a
    code element depends upon
    Number of elements that
    depend on a code element
    [4] S. Henry and D. Kafura. “Software structure metrics based on information flow”.
    IEEE Transactions on Software Engineering, no. 5. pp. 510–518. 1981.

    View Slide

  9. Metrics in a Real Project
    9
    Changes in complexity measures related to real-world project
    events
    2011
    “Start-Up” phase
    Maintenance phase

    View Slide

  10. Analyzr Framework
    10
    Analyzing every commit of a project

    View Slide

  11. Analyzr Framework
    11
    Details for a single commit
    Cyclomatic
    Complexity
    Halstead
    Volume
    Halstead
    Difficulty
    Fan-In Fan-Out Source
    Lines of
    Code
    https://github.com/firebug/firebug/commit/076da997e6bc0cb14b27afcc2d845c730de52fcf

    View Slide

  12. Analyzr Architecture
    12
    Under the Hood
    Git &
    SVN
    JHawk (Java) &
    Complexity
    Report (JS)
    Python & Django

    View Slide

  13. Metric Aggregation
    ■ Squale: bounded, continuous scale
    for comparison of metric values [5]
    ■ Combines low-level marks (raw metric values)
    into individual marks (IM)
    ■ IM mapped to unified scale (from 0 to 3),
    determined by experts [6]
    ■ IM then aggregated (weighted) to form global mark
    13
    The Software Quality Enhancement (Squale) Model
    [5] Mordal-Manet et al. "The squale model—A practice-based industrial quality model."
    IEEE International Conference on Software Maintenance. 2009.
    [6] Balmas et al. “Software metric for Java and C++ practices“. Research Report. pp.44. 2010.

    View Slide

  14. Expertise Extraction
    ■ Determine changes in code metrics, i.e. deltas,
    for each developer over time
    ■ Identify influence of commit on component’s global mark
    ■ Expertise: ratio of commits that increase / decrease marks
    (quality impact, qi) smoothed by total author commits ( )
    14
    From metrics to knowledge about developers
    Expertise(a)

    View Slide

  15. Evaluation
    ■ Two surveys performed for evaluation
    ■ Expert identification — who is currently being asked
    ■ without knowledge of Analyzr results
    ■ Proposal evaluation — who should be asked
    ■ with knowledge of results
    ■ Bounded time frame of observation
    ■ Distinguish temporary and permanent leave
    ■ In this study: 62 days
    15
    Assessing the quality of results with surveys

    View Slide

  16. Expert Identification Survey
    ■ Task: Identify top 3 domain experts for front and back end
    components, prior to tool introduction
    ■ Total agreement between participants on top expert
    ■ In front end components: 55%
    ■ In back end components: 33.3%
    ■ Majority agreed on top expert in 88% of total choices
    → Developers have a specific component expert in mind
    16
    Who is currently being identified as domain expert

    View Slide

  17. Expert Identification Survey
    ■ Accuracy of Analyzr predictions for first choice
    of domain expert vs intuitive developer picks
    ■ Front and back end combined: 47.37% match
    ■ Back end: 71.43% match
    ■ Front end: 50% miss
    17
    Comparing Analyzr predictions to intuitive survey data

    View Slide

  18. Analyzr Proposal Evaluation
    ■ Developers asked to rate first, second, third choice
    of component expert suggested
    ■ Scale: strong disagree (0), disagree (1), agree (2), strong agree (3)
    ■ Back end: 100% agreement (87.5% strong agree)
    ■ Front end: 90% agreement (48.5% strong agree)
    18
    Survey on what developers think of suggestions

    View Slide

  19. Summary & Conclusion
    ■ Feasibility of identifying experts using code complexity
    ■ Algorithmically identified experts differed from intuitive selections
    ■ Algorithmically identified experts rated as accurate in 90% of cases
    → Evidence for non-obvious component experts,
    i.e. “hidden experts”
    → Asking for the “guru” might not be ideal,
    might simply get you the default person
    19
    Take-away messages
    [email protected] @chrisma0

    View Slide

  20. Image Credits
    20
    In order of appearance
    ■ Recruitment by Gerald Wildmoser from the Noun Project (CC BY 3.0 US)
    ■ Truck by Mello from the Noun Project (CC BY 3.0 US)
    ■ questions by Gregor Cresnar from the Noun Project (CC BY 3.0 US)
    ■ Target by Arthur Shlain from the Noun Project (CC BY 3.0 US)
    ■ Search Code by icon 54 from the Noun Project (CC BY 3.0 US)
    ■ Difficulty Gauge by Thanh Nguyen from the Noun Project (CC BY 3.0 US)
    ■ puzzles by Kirby Wu from the Noun Project (CC BY 3.0 US)
    ■ clipboard by David from the Noun Project (CC BY 3.0 US)
    ■ Seo expert by H Alberto Gongora from the Noun Project (CC BY 3.0 US)
    ■ Idea by Gilbert Bages from the Noun Project (CC BY 3.0 US)

    View Slide