Pro Yearly is on sale from $80 to $50! »

A Novel Approach for Estimating Truck Factors (ICPC 2016)

A Novel Approach for Estimating Truck Factors (ICPC 2016)

Truck Factor (TF) is a metric proposed by the agile community as a tool to identify concentration of knowledge in software development environments. It states the minimal number of developers that have to be hit by a truck (or quit) before a project is incapacitated. In other words, TF helps to measure how prepared is a project to deal with developer turnover. Despite its clear relevance, few studies explore this metric. Altogether there is no consensus about how to calculate it, and no supporting evidence backing estimates for systems in the wild. To mitigate both issues, we propose a novel (and automated) approach for estimating TF-values, which we execute against a corpus of 133 popular project in GitHub. We later survey developers as a means to assess the reliability of our results. Among others, we find that the majority of our target systems (65%) have TF <= 2. Surveying developers from 67 target systems provides confidence towards our estimates; in 84% of the valid answers we collect, developers agree or partially agree that the TF’s authors are the main authors of their systems; in 53% we receive a positive or partially positive answer regarding our estimated truck factors.

13beaa3b7239eca3319d54c6a9f3a85a?s=128

ASERG, DCC, UFMG

May 17, 2016
Tweet

Transcript

  1. Guilherme Avelino*†, Leonardo Passos‡, Andre Hora* and Marco Tulio Valente*

    *Federal University of Minas Gerais (UFMG), Brazil †Federal University of Piaui (UFPI), Brazil ‡University of Waterloo, Canada ICPC 2016 A Novel Approach for Estimating Truck Factors
  2. “What if you saw this posted tomorrow: Guido’s unexpected death

    has come as a shock to us all. Disgruntled members of the Tcl mob are suspected, but no smoking gun has been found...” Python’s mailing list discussion, 1994. 2
  3. 3 Truck Factor “The number of people on your team

    that have to be hit by a truck (or quit) before the project is in serious trouble”
  4. 4 Existing Approaches Zazworka et al. (ESEM 2010) and Ricca

    et al. (ESEM 2010)
  5. 5 Existing Approaches Simplistic code ownership metric Zazworka et al.

    (ESEM 2010) and Ricca et al. (ESEM 2010)
  6. 6 Existing Approaches Simplistic code ownership metric Only for small

    teams (<40 developers) Zazworka et al. (ESEM 2010) and Ricca et al. (ESEM 2010)
  7. 7 Existing Approaches Simplistic code ownership metric Only for small

    teams (<40 developers) No empirical evidence from real-world systems Zazworka et al. (ESEM 2010) and Ricca et al. (ESEM 2010)
  8. 8 Contributions

  9. 9 Contributions New approach

  10. 10 Contributions Truck Factor Systems TF estimates New approach

  11. 11 Contributions Truck Factor Systems TF estimates New approach Survey

    with developers
  12. 12 Contributions Truck Factor Systems TF estimates Survey with developers

    Best practices New approach
  13. Proposed Approach 13

  14. Proposed Approach 14

  15. Proposed Approach 15

  16. Proposed Approach 16

  17. Proposed Approach 17

  18. STEP 4 – Define Authorship 18

  19. STEP 4 – Define Authorship 19 Degree of Authorship (DOA)

    metric
  20. STEP 4 – Define Authorship 20 Degree of Authorship (DOA)

    metric Degree of Knowledge (DOK) DOK = DOI + DOA Fritz et al. (TOSEM 2014)
  21. STEP 4 – Define Authorship 21 Degree of Authorship (DOA)

    metric Authors Degree of Knowledge (DOK) DOK = DOI + DOA Fritz et al. (TOSEM 2014)
  22. Proposed Approach 22

  23. STEP 5 – Estimate Truck Factor 23 A1 A1 A1

    A1 A1 A2 A2 A3 A3 A4 A5 A6 A7 A8 A9 An System’s Files … Number of Files Authors A1 A2 A3 A4 An
  24. STEP 5 – Estimate Truck Factor 24 System’s Files Number

    of Files X Authors … A1 A2 A3 A4 An A1 A1 A1 A1 A1 A2 A2 A3 A3 A4 A5 A6 A7 A8 A9 An
  25. STEP 5 – Estimate Truck Factor 25 System’s Files Number

    of Files X X Authors … A1 A2 A3 A4 An A1 A1 A1 A1 A1 A2 A2 A3 A3 A4 A5 A6 A7 A8 A9 An
  26. STEP 5 – Estimate Truck Factor 26 System’s Files Number

    of Files X X 50% Authors X … A1 A2 A3 A4 An A1 A1 A1 A1 A1 A2 A2 A3 A3 A4 A5 A6 A7 A8 A9 An
  27. STEP 5 – Estimate Truck Factor 27 System’s Files Number

    of Files X X 50% Authors X TF = 3 … A1 A2 A3 A4 An A1 A1 A1 A1 A1 A2 A2 A3 A3 A4 A5 A6 A7 A8 A9 An
  28. 28 Validation

  29. 29 Validation Selection of Target Subjects

  30. 30 Validation Selection of Target Subjects Truck Factor Estimation

  31. 31 Validation Selection of Target Subjects Truck Factor Estimation Survey

    with Developers
  32. 32 Selection of Target Subjects

  33. 33 Truck Factor Results 0 5 10 20 30 40

    50 60
  34. 34 Truck Factor Results TF ≤ 2 = 65% systems

    0 5 10 20 30 40 50 60
  35. 35 0 5 10 20 30 40 50 60 Truck

    Factor Results 57 TF = 57
  36. 36 0 5 10 20 30 40 50 60 Truck

    Factor Results 18 TF = 57 TF = 18
  37. 37 0 5 10 20 30 40 50 60 Truck

    Factor Results 12 TF = 57 TF = 18 TF = 12
  38. 38 0 5 10 20 30 40 50 60 Truck

    Factor Results 11 TF = 57 TF = 18 TF = 12 TF = 11
  39. 39 0 5 10 20 30 40 50 60 Truck

    Factor Results 9 TF = 57 TF = 18 TF = 12 TF = 11 TF = 9
  40. 40 Survey Application GitHub issues Opened: 114 47 5 62

    Not answered Discarded Answered
  41. 41 Survey Application GitHub issues Opened: 114 Response ratio: 54%

    47 5 62 Not answered Discarded Answered
  42. 42 Survey Design

  43. 43 Survey Design TF concept

  44. 44 Survey Design TF concept System’s TF

  45. 45 Survey Design TF concept System’s TF Top- ranked authors

  46. 46 Survey Design TF concept System’s TF Top- ranked authors

    Questions
  47. Question 1. Do developers agree that the top-ranked authors are

    the main developers of their projects? 50% 29% 15% 6% Agree Partially Disagree Unclear 47
  48. Question 1. Do developers agree that the top-ranked authors are

    the main developers of their projects? 50% 29% 15% 6% Agree Partially Disagree Unclear “Yes, that’s me.” (bjorn/tiled) 48
  49. Question 1. Do developers agree that the top-ranked authors are

    the main developers of their projects? 50% 29% 15% 6% Agree Partially Disagree Unclear “Yes, that’s me.” (bjorn/tiled) “Yes, but I think DevX should be in that list too.” (Respect/Validation) 49
  50. Question 1. Do developers agree that the top-ranked authors are

    the main developers of their projects? 50% 29% 15% 6% Agree Partially Disagree Unclear “Yes, that’s me.” (bjorn/tiled) “Yes, but I think DevX should be in that list too.” (Respect/Validation) “No, DevY hasn’t been contributing for a while now. I’ve taken over what he was doing.” (backup/backup) 50
  51. Question 2. Do developers agree that their projects will be

    in trouble if they loose the truck factor authors? 39% 10% 43% 8% Agree Partially Disagree Unclear 51
  52. Question 2. Do developers agree that their projects will be

    in trouble if they loose the truck factor authors? 39% 10% 43% 8% Agree Partially Disagree Unclear “If both of us left, the project would be kind of unmaintained.” (SFTtech/openage) 52
  53. Question 2. Do developers agree that their projects will be

    in trouble if they loose the truck factor authors? 39% 10% 43% 8% Agree Partially Disagree Unclear “If both of us left, the project would be kind of unmaintained.” (SFTtech/openage) “Somewhat agree. A loss in one area would mean a temporary dip in maintenance of that area until someone else stepped in.” (saltstack/salt) 53
  54. Question 2. Do developers agree that their projects will be

    in trouble if they loose the truck factor authors? 39% 10% 43% 8% Agree Partially Disagree Unclear “If both of us left, the project would be kind of unmaintained.” (SFTtech/openage) “Somewhat agree. A loss in one area would mean a temporary dip in maintenance of that area until someone else stepped in.” (saltstack/salt) “No, I think others could take over.” (silexphp/Silex) 54
  55. Question 3. What are the development practices that can attenuate

    the loss of top-ranked authors? “I’d say that the vibrant community is the reason for it.” (rails/rails) “The project is pretty well- documented.” (wp-cli/wp-cli) “The people you listed are paid to work on the project, along with a number of others.” (ipython/ipython) 55
  56. Question 3. What are the development practices that can attenuate

    the loss of top-ranked authors? “I’d say that the vibrant community is the reason for it.” (rails/rails) “The project is pretty well- documented.” (wp-cli/wp-cli) “The people you listed are paid to work on the project, along with a number of others.” (ipython/ipython) 56
  57. Question 3. What are the development practices that can attenuate

    the loss of top-ranked authors? “I’d say that the vibrant community is the reason for it.” (rails/rails) “The project is pretty well- documented.” (wp-cli/wp-cli) “The people you listed are paid to work on the project, along with a number of others.” (ipython/ipython) 57
  58. Question 3. What are the development practices that can attenuate

    the loss of top-ranked authors? “I’d say that the vibrant community is the reason for it.” (rails/rails) “The project is pretty well- documented.” (wp-cli/wp-cli) “The people you listed are paid to work on the project, along with a number of others.” (ipython/ipython) 58
  59. 59 Summary

  60. 60 Summary We presented a new approach to compute truck

    factor.
  61. 61 Summary We presented a new approach to compute truck

    factor. Most of the systems (65%) have TF ≤ 2.
  62. 62 Summary We presented a new approach to compute truck

    factor. Most of the systems (65%) have TF ≤ 2. We assess the reliability and limitations of our approach by surveying the systems developers.
  63. A Novel Approach for Estimating Truck Factors http://aserg.labsoft.dcc.ufmg.br/truckfactor Thank you

    for your attention!