A Novel Approach for Estimating Truck Factors (ICPC 2016)

A Novel Approach for Estimating Truck Factors (ICPC 2016)

Truck Factor (TF) is a metric proposed by the agile community as a tool to identify concentration of knowledge in software development environments. It states the minimal number of developers that have to be hit by a truck (or quit) before a project is incapacitated. In other words, TF helps to measure how prepared is a project to deal with developer turnover. Despite its clear relevance, few studies explore this metric. Altogether there is no consensus about how to calculate it, and no supporting evidence backing estimates for systems in the wild. To mitigate both issues, we propose a novel (and automated) approach for estimating TF-values, which we execute against a corpus of 133 popular project in GitHub. We later survey developers as a means to assess the reliability of our results. Among others, we find that the majority of our target systems (65%) have TF

13beaa3b7239eca3319d54c6a9f3a85a?s=128

ASERG, DCC, UFMG

May 17, 2016
Tweet

Transcript

  1. 1.

    Guilherme Avelino*†, Leonardo Passos‡, Andre Hora* and Marco Tulio Valente*

    *Federal University of Minas Gerais (UFMG), Brazil †Federal University of Piaui (UFPI), Brazil ‡University of Waterloo, Canada ICPC 2016 A Novel Approach for Estimating Truck Factors
  2. 2.

    “What if you saw this posted tomorrow: Guido’s unexpected death

    has come as a shock to us all. Disgruntled members of the Tcl mob are suspected, but no smoking gun has been found...” Python’s mailing list discussion, 1994. 2
  3. 3.

    3 Truck Factor “The number of people on your team

    that have to be hit by a truck (or quit) before the project is in serious trouble”
  4. 6.

    6 Existing Approaches Simplistic code ownership metric Only for small

    teams (<40 developers) Zazworka et al. (ESEM 2010) and Ricca et al. (ESEM 2010)
  5. 7.

    7 Existing Approaches Simplistic code ownership metric Only for small

    teams (<40 developers) No empirical evidence from real-world systems Zazworka et al. (ESEM 2010) and Ricca et al. (ESEM 2010)
  6. 20.

    STEP 4 – Define Authorship 20 Degree of Authorship (DOA)

    metric Degree of Knowledge (DOK) DOK = DOI + DOA Fritz et al. (TOSEM 2014)
  7. 21.

    STEP 4 – Define Authorship 21 Degree of Authorship (DOA)

    metric Authors Degree of Knowledge (DOK) DOK = DOI + DOA Fritz et al. (TOSEM 2014)
  8. 23.

    STEP 5 – Estimate Truck Factor 23 A1 A1 A1

    A1 A1 A2 A2 A3 A3 A4 A5 A6 A7 A8 A9 An System’s Files … Number of Files Authors A1 A2 A3 A4 An
  9. 24.

    STEP 5 – Estimate Truck Factor 24 System’s Files Number

    of Files X Authors … A1 A2 A3 A4 An A1 A1 A1 A1 A1 A2 A2 A3 A3 A4 A5 A6 A7 A8 A9 An
  10. 25.

    STEP 5 – Estimate Truck Factor 25 System’s Files Number

    of Files X X Authors … A1 A2 A3 A4 An A1 A1 A1 A1 A1 A2 A2 A3 A3 A4 A5 A6 A7 A8 A9 An
  11. 26.

    STEP 5 – Estimate Truck Factor 26 System’s Files Number

    of Files X X 50% Authors X … A1 A2 A3 A4 An A1 A1 A1 A1 A1 A2 A2 A3 A3 A4 A5 A6 A7 A8 A9 An
  12. 27.

    STEP 5 – Estimate Truck Factor 27 System’s Files Number

    of Files X X 50% Authors X TF = 3 … A1 A2 A3 A4 An A1 A1 A1 A1 A1 A2 A2 A3 A3 A4 A5 A6 A7 A8 A9 An
  13. 35.

    35 0 5 10 20 30 40 50 60 Truck

    Factor Results 57 TF = 57
  14. 36.

    36 0 5 10 20 30 40 50 60 Truck

    Factor Results 18 TF = 57 TF = 18
  15. 37.

    37 0 5 10 20 30 40 50 60 Truck

    Factor Results 12 TF = 57 TF = 18 TF = 12
  16. 38.

    38 0 5 10 20 30 40 50 60 Truck

    Factor Results 11 TF = 57 TF = 18 TF = 12 TF = 11
  17. 39.

    39 0 5 10 20 30 40 50 60 Truck

    Factor Results 9 TF = 57 TF = 18 TF = 12 TF = 11 TF = 9
  18. 40.
  19. 41.
  20. 47.

    Question 1. Do developers agree that the top-ranked authors are

    the main developers of their projects? 50% 29% 15% 6% Agree Partially Disagree Unclear 47
  21. 48.

    Question 1. Do developers agree that the top-ranked authors are

    the main developers of their projects? 50% 29% 15% 6% Agree Partially Disagree Unclear “Yes, that’s me.” (bjorn/tiled) 48
  22. 49.

    Question 1. Do developers agree that the top-ranked authors are

    the main developers of their projects? 50% 29% 15% 6% Agree Partially Disagree Unclear “Yes, that’s me.” (bjorn/tiled) “Yes, but I think DevX should be in that list too.” (Respect/Validation) 49
  23. 50.

    Question 1. Do developers agree that the top-ranked authors are

    the main developers of their projects? 50% 29% 15% 6% Agree Partially Disagree Unclear “Yes, that’s me.” (bjorn/tiled) “Yes, but I think DevX should be in that list too.” (Respect/Validation) “No, DevY hasn’t been contributing for a while now. I’ve taken over what he was doing.” (backup/backup) 50
  24. 51.

    Question 2. Do developers agree that their projects will be

    in trouble if they loose the truck factor authors? 39% 10% 43% 8% Agree Partially Disagree Unclear 51
  25. 52.

    Question 2. Do developers agree that their projects will be

    in trouble if they loose the truck factor authors? 39% 10% 43% 8% Agree Partially Disagree Unclear “If both of us left, the project would be kind of unmaintained.” (SFTtech/openage) 52
  26. 53.

    Question 2. Do developers agree that their projects will be

    in trouble if they loose the truck factor authors? 39% 10% 43% 8% Agree Partially Disagree Unclear “If both of us left, the project would be kind of unmaintained.” (SFTtech/openage) “Somewhat agree. A loss in one area would mean a temporary dip in maintenance of that area until someone else stepped in.” (saltstack/salt) 53
  27. 54.

    Question 2. Do developers agree that their projects will be

    in trouble if they loose the truck factor authors? 39% 10% 43% 8% Agree Partially Disagree Unclear “If both of us left, the project would be kind of unmaintained.” (SFTtech/openage) “Somewhat agree. A loss in one area would mean a temporary dip in maintenance of that area until someone else stepped in.” (saltstack/salt) “No, I think others could take over.” (silexphp/Silex) 54
  28. 55.

    Question 3. What are the development practices that can attenuate

    the loss of top-ranked authors? “I’d say that the vibrant community is the reason for it.” (rails/rails) “The project is pretty well- documented.” (wp-cli/wp-cli) “The people you listed are paid to work on the project, along with a number of others.” (ipython/ipython) 55
  29. 56.

    Question 3. What are the development practices that can attenuate

    the loss of top-ranked authors? “I’d say that the vibrant community is the reason for it.” (rails/rails) “The project is pretty well- documented.” (wp-cli/wp-cli) “The people you listed are paid to work on the project, along with a number of others.” (ipython/ipython) 56
  30. 57.

    Question 3. What are the development practices that can attenuate

    the loss of top-ranked authors? “I’d say that the vibrant community is the reason for it.” (rails/rails) “The project is pretty well- documented.” (wp-cli/wp-cli) “The people you listed are paid to work on the project, along with a number of others.” (ipython/ipython) 57
  31. 58.

    Question 3. What are the development practices that can attenuate

    the loss of top-ranked authors? “I’d say that the vibrant community is the reason for it.” (rails/rails) “The project is pretty well- documented.” (wp-cli/wp-cli) “The people you listed are paid to work on the project, along with a number of others.” (ipython/ipython) 58
  32. 61.

    61 Summary We presented a new approach to compute truck

    factor. Most of the systems (65%) have TF ≤ 2.
  33. 62.

    62 Summary We presented a new approach to compute truck

    factor. Most of the systems (65%) have TF ≤ 2. We assess the reliability and limitations of our approach by surveying the systems developers.