$30 off During Our Annual Pro Sale. View Details »

A Novel Approach for Estimating Truck Factors (ICPC 2016)

A Novel Approach for Estimating Truck Factors (ICPC 2016)

Truck Factor (TF) is a metric proposed by the agile community as a tool to identify concentration of knowledge in software development environments. It states the minimal number of developers that have to be hit by a truck (or quit) before a project is incapacitated. In other words, TF helps to measure how prepared is a project to deal with developer turnover. Despite its clear relevance, few studies explore this metric. Altogether there is no consensus about how to calculate it, and no supporting evidence backing estimates for systems in the wild. To mitigate both issues, we propose a novel (and automated) approach for estimating TF-values, which we execute against a corpus of 133 popular project in GitHub. We later survey developers as a means to assess the reliability of our results. Among others, we find that the majority of our target systems (65%) have TF <= 2. Surveying developers from 67 target systems provides confidence towards our estimates; in 84% of the valid answers we collect, developers agree or partially agree that the TF’s authors are the main authors of their systems; in 53% we receive a positive or partially positive answer regarding our estimated truck factors.

ASERG, DCC, UFMG

May 17, 2016
Tweet

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Transcript

  1. Guilherme Avelino*†, Leonardo Passos‡, Andre Hora* and
    Marco Tulio Valente*
    *Federal University of Minas Gerais (UFMG), Brazil
    †Federal University of Piaui (UFPI), Brazil
    ‡University of Waterloo, Canada
    ICPC 2016
    A Novel Approach for Estimating
    Truck Factors

    View Slide

  2. “What if you saw this posted tomorrow:
    Guido’s unexpected death has come as a shock
    to us all. Disgruntled members of the Tcl mob
    are suspected, but no smoking gun has been
    found...”
    Python’s mailing list discussion, 1994.
    2

    View Slide

  3. 3
    Truck Factor
    “The number of people on your team that
    have to be hit by a truck (or quit) before the
    project is in serious trouble”

    View Slide

  4. 4
    Existing Approaches
    Zazworka et al. (ESEM 2010) and Ricca et al. (ESEM 2010)

    View Slide

  5. 5
    Existing Approaches
    Simplistic code ownership metric
    Zazworka et al. (ESEM 2010) and Ricca et al. (ESEM 2010)

    View Slide

  6. 6
    Existing Approaches
    Simplistic code ownership metric
    Only for small teams (<40 developers)
    Zazworka et al. (ESEM 2010) and Ricca et al. (ESEM 2010)

    View Slide

  7. 7
    Existing Approaches
    Simplistic code ownership metric
    Only for small teams (<40 developers)
    No empirical evidence from real-world systems
    Zazworka et al. (ESEM 2010) and Ricca et al. (ESEM 2010)

    View Slide

  8. 8
    Contributions

    View Slide

  9. 9
    Contributions
    New approach

    View Slide

  10. 10
    Contributions
    Truck
    Factor
    Systems
    TF estimates
    New approach

    View Slide

  11. 11
    Contributions
    Truck
    Factor
    Systems
    TF estimates
    New approach
    Survey with developers

    View Slide

  12. 12
    Contributions
    Truck
    Factor
    Systems
    TF estimates
    Survey with developers Best practices
    New approach

    View Slide

  13. Proposed Approach
    13

    View Slide

  14. Proposed Approach
    14

    View Slide

  15. Proposed Approach
    15

    View Slide

  16. Proposed Approach
    16

    View Slide

  17. Proposed Approach
    17

    View Slide

  18. STEP 4 – Define Authorship
    18

    View Slide

  19. STEP 4 – Define Authorship
    19
    Degree of Authorship (DOA) metric

    View Slide

  20. STEP 4 – Define Authorship
    20
    Degree of Authorship (DOA) metric
    Degree of Knowledge (DOK)
    DOK = DOI + DOA
    Fritz et al. (TOSEM 2014)

    View Slide

  21. STEP 4 – Define Authorship
    21
    Degree of Authorship (DOA) metric
    Authors
    Degree of Knowledge (DOK)
    DOK = DOI + DOA
    Fritz et al. (TOSEM 2014)

    View Slide

  22. Proposed Approach
    22

    View Slide

  23. STEP 5 – Estimate Truck Factor
    23
    A1
    A1
    A1
    A1
    A1
    A2
    A2
    A3
    A3
    A4
    A5
    A6
    A7
    A8
    A9
    An
    System’s Files

    Number of Files
    Authors
    A1
    A2
    A3
    A4
    An

    View Slide

  24. STEP 5 – Estimate Truck Factor
    24
    System’s Files
    Number of Files
    X
    Authors

    A1
    A2
    A3
    A4
    An
    A1
    A1
    A1
    A1
    A1
    A2
    A2
    A3
    A3
    A4
    A5
    A6
    A7
    A8
    A9
    An

    View Slide

  25. STEP 5 – Estimate Truck Factor
    25
    System’s Files
    Number of Files
    X
    X
    Authors

    A1
    A2
    A3
    A4
    An
    A1
    A1
    A1
    A1
    A1
    A2
    A2
    A3
    A3
    A4
    A5
    A6
    A7
    A8
    A9
    An

    View Slide

  26. STEP 5 – Estimate Truck Factor
    26
    System’s Files
    Number of Files
    X
    X
    50%
    Authors
    X

    A1
    A2
    A3
    A4
    An
    A1
    A1
    A1
    A1
    A1
    A2
    A2
    A3
    A3
    A4
    A5
    A6
    A7
    A8
    A9
    An

    View Slide

  27. STEP 5 – Estimate Truck Factor
    27
    System’s Files
    Number of Files
    X
    X
    50%
    Authors
    X
    TF = 3

    A1
    A2
    A3
    A4
    An
    A1
    A1
    A1
    A1
    A1
    A2
    A2
    A3
    A3
    A4
    A5
    A6
    A7
    A8
    A9
    An

    View Slide

  28. 28
    Validation

    View Slide

  29. 29
    Validation
    Selection of Target Subjects

    View Slide

  30. 30
    Validation
    Selection of Target Subjects
    Truck Factor Estimation

    View Slide

  31. 31
    Validation
    Selection of Target Subjects
    Truck Factor Estimation
    Survey with Developers

    View Slide

  32. 32
    Selection of Target Subjects

    View Slide

  33. 33
    Truck Factor Results
    0 5 10 20 30 40 50 60

    View Slide

  34. 34
    Truck Factor Results
    TF ≤ 2 = 65% systems
    0 5 10 20 30 40 50 60

    View Slide

  35. 35
    0 5 10 20 30 40 50 60
    Truck Factor Results
    57
    TF = 57

    View Slide

  36. 36
    0 5 10 20 30 40 50 60
    Truck Factor Results
    18
    TF = 57
    TF = 18

    View Slide

  37. 37
    0 5 10 20 30 40 50 60
    Truck Factor Results
    12
    TF = 57
    TF = 18
    TF = 12

    View Slide

  38. 38
    0 5 10 20 30 40 50 60
    Truck Factor Results
    11
    TF = 57
    TF = 18
    TF = 12
    TF = 11

    View Slide

  39. 39
    0 5 10 20 30 40 50 60
    Truck Factor Results
    9
    TF = 57
    TF = 18
    TF = 12
    TF = 11
    TF = 9

    View Slide

  40. 40
    Survey Application
    GitHub issues
    Opened: 114
    47
    5
    62
    Not answered
    Discarded
    Answered

    View Slide

  41. 41
    Survey Application
    GitHub issues
    Opened: 114
    Response ratio: 54%
    47
    5
    62
    Not answered
    Discarded
    Answered

    View Slide

  42. 42
    Survey Design

    View Slide

  43. 43
    Survey Design
    TF
    concept

    View Slide

  44. 44
    Survey Design
    TF
    concept
    System’s
    TF

    View Slide

  45. 45
    Survey Design
    TF
    concept
    System’s
    TF
    Top-
    ranked
    authors

    View Slide

  46. 46
    Survey Design
    TF
    concept
    System’s
    TF
    Top-
    ranked
    authors
    Questions

    View Slide

  47. Question 1. Do developers agree that the top-ranked
    authors are the main developers of their projects?
    50%
    29%
    15%
    6%
    Agree
    Partially
    Disagree
    Unclear
    47

    View Slide

  48. Question 1. Do developers agree that the top-ranked
    authors are the main developers of their projects?
    50%
    29%
    15%
    6%
    Agree
    Partially
    Disagree
    Unclear
    “Yes, that’s me.”
    (bjorn/tiled)
    48

    View Slide

  49. Question 1. Do developers agree that the top-ranked
    authors are the main developers of their projects?
    50%
    29%
    15%
    6%
    Agree
    Partially
    Disagree
    Unclear
    “Yes, that’s me.”
    (bjorn/tiled)
    “Yes, but I think DevX should
    be in that list too.”
    (Respect/Validation)
    49

    View Slide

  50. Question 1. Do developers agree that the top-ranked
    authors are the main developers of their projects?
    50%
    29%
    15%
    6%
    Agree
    Partially
    Disagree
    Unclear
    “Yes, that’s me.”
    (bjorn/tiled)
    “Yes, but I think DevX should be
    in that list too.”
    (Respect/Validation)
    “No, DevY hasn’t been
    contributing for a while now.
    I’ve taken over what he was
    doing.”
    (backup/backup)
    50

    View Slide

  51. Question 2. Do developers agree that their projects
    will be in trouble if they loose the truck factor authors?
    39%
    10%
    43%
    8%
    Agree
    Partially
    Disagree
    Unclear
    51

    View Slide

  52. Question 2. Do developers agree that their projects
    will be in trouble if they loose the truck factor authors?
    39%
    10%
    43%
    8%
    Agree
    Partially
    Disagree
    Unclear
    “If both of us left, the project
    would be kind of unmaintained.”
    (SFTtech/openage)
    52

    View Slide

  53. Question 2. Do developers agree that their projects
    will be in trouble if they loose the truck factor authors?
    39%
    10%
    43%
    8%
    Agree
    Partially
    Disagree
    Unclear
    “If both of us left, the project
    would be kind of unmaintained.”
    (SFTtech/openage)
    “Somewhat agree. A loss in one
    area would mean a temporary
    dip in maintenance of that area
    until someone else stepped in.”
    (saltstack/salt)
    53

    View Slide

  54. Question 2. Do developers agree that their projects
    will be in trouble if they loose the truck factor authors?
    39%
    10%
    43%
    8%
    Agree
    Partially
    Disagree
    Unclear
    “If both of us left, the project
    would be kind of unmaintained.”
    (SFTtech/openage)
    “Somewhat agree. A loss in one
    area would mean a temporary dip
    in maintenance of that area until
    someone else stepped in.”
    (saltstack/salt)
    “No, I think others could take over.”
    (silexphp/Silex)
    54

    View Slide

  55. Question 3. What are the development practices that
    can attenuate the loss of top-ranked authors?
    “I’d say that the vibrant
    community is the reason for
    it.”
    (rails/rails)
    “The project is pretty well-
    documented.”
    (wp-cli/wp-cli)
    “The people you listed are paid
    to work on the project, along
    with a number of others.”
    (ipython/ipython)
    55

    View Slide

  56. Question 3. What are the development practices that
    can attenuate the loss of top-ranked authors?
    “I’d say that the vibrant
    community is the reason for
    it.”
    (rails/rails)
    “The project is pretty well-
    documented.”
    (wp-cli/wp-cli)
    “The people you listed are paid
    to work on the project, along
    with a number of others.”
    (ipython/ipython)
    56

    View Slide

  57. Question 3. What are the development practices that
    can attenuate the loss of top-ranked authors?
    “I’d say that the vibrant
    community is the reason
    for it.”
    (rails/rails)
    “The project is pretty well-
    documented.”
    (wp-cli/wp-cli)
    “The people you listed are paid
    to work on the project, along
    with a number of others.”
    (ipython/ipython)
    57

    View Slide

  58. Question 3. What are the development practices that
    can attenuate the loss of top-ranked authors?
    “I’d say that the vibrant
    community is the reason for
    it.”
    (rails/rails)
    “The project is pretty well-
    documented.”
    (wp-cli/wp-cli)
    “The people you listed are
    paid to work on the project,
    along with a number of
    others.”
    (ipython/ipython)
    58

    View Slide

  59. 59
    Summary

    View Slide

  60. 60
    Summary
    We presented a new approach to compute
    truck factor.

    View Slide

  61. 61
    Summary
    We presented a new approach to compute
    truck factor.
    Most of the systems (65%) have TF ≤ 2.

    View Slide

  62. 62
    Summary
    We presented a new approach to compute
    truck factor.
    Most of the systems (65%) have TF ≤ 2.
    We assess the reliability and limitations
    of our approach by surveying the systems
    developers.

    View Slide

  63. A Novel Approach for Estimating
    Truck Factors
    http://aserg.labsoft.dcc.ufmg.br/truckfactor
    Thank you for your attention!

    View Slide