Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Why Algorithms for Estimating Truck Factors Fail? (VEM Workshop, 2018)

Why Algorithms for Estimating Truck Factors Fail? (VEM Workshop, 2018)

In this paper, we present a set of reasons for the incorrect results presented by algorithms for estimating truck (or bus) factors, which are usually based on the analysis of commits. We report that commits do not explain the TF of 10 systems in a dataset of 33 systems with well-known TF results.

ASERG, DCC, UFMG

September 20, 2017
Tweet

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Transcript

  1. Why Algorithms
    for Estimating
    Truck Factors
    Fail
    Thaís Mombach (UFMG)
    Mívian Ferreira (UFMG)
    Kecia Ferreira (CEFET-MG)
    Marco Tulio Valente (UFMG)

    View Slide

  2. Truck (or Bus) Factor
    2
    The minimum number of developers that if hit by a truck
    (or bus) will put a project in a serious risk

    View Slide

  3. TF reveals the concentration of knowledge in software
    projects, both open source and commercial ones
    3

    View Slide

  4. Work with the highest impact of our group
    among practitioners
    4

    View Slide

  5. PeerJ 2015 (preprint, targeting practitioners)
    5
    https://peerj.com/preprints/1233

    View Slide

  6. Remembered until today ….
    6
    "Oh, nice to hear from you! I heard a lot about (and read)
    your group's truck factor paper. Cool work!"
    (answer received recently, in another survey, not related with TF)

    View Slide

  7. ICPC 2016: Algorithm for Estimating TFs
    7
    https://arxiv.org/abs/1604.06766

    View Slide

  8. Tool Support
    8
    https://github.com/aserg-ufmg/Truck-Factor

    View Slide

  9. ICPC 2017: Comparative Study
    9

    View Slide

  10. ICPC 2017: Comparative Study
    10
    A Comparison of Three Algorithms for Computing Truck Factors, ICPC 2017

    View Slide

  11. Why TF algorithms fail
    11

    View Slide

  12. Oracle
    • 33 systems; reused from previous work
    • Built by surveying developers
    12

    View Slide

  13. When do commits explain TF results?
    • Suppose:
    • TF: set with devs responsible for the TF of a system
    • size(TF)=n
    • C= set with the top-n devs with most commits
    • if TF == C
    • then commits explain the system’s TF
    13

    View Slide

  14. 14
    TF # systems where commits
    explain the TF results do not explain the TF results
    1 17 2
    2 4 1
    3 2 1
    4 0 3
    5 0 1
    11 0 1
    15 0 1
    TOTAL 23 10

    View Slide

  15. Commits do not explain the TF of 10 systems
    (out of 33 systems)
    15

    View Slide

  16. How to explain the TF of these systems?
    • Survey with 20 devs, from the 10 systems
    • 17 e-mails, 7 answers
    16

    View Slide

  17. Reason #1: Social roles (4 answers)
    17
    "I was one of the more vocal people on IRC, mailing lists
    and other channels ..."
    "Going at conferences, meetup to evangelize the project,
    and talk about it"

    View Slide

  18. Reason #2: Testing & Quality Assurance (2 answers)
    18
    "Testing the project and filling bug reports"

    View Slide

  19. Role #3: Pull Requests and Merges (2 answers)
    19
    "Consider other measures: number of pull requests
    merged"

    View Slide

  20. Interview with Linus Torvalds (on TF & Reason #3)
    20
    https://www.bloomberg.com/news/articles/2015-06-16/the-creator-of-linux-on-the-future-without-him

    View Slide

  21. 21
    "There is no concrete plan of action if I die.
    But that would have been a bigger deal 10 or 15 years ago."
    ….
    "[Today] most of the code I get is written by tens of people"
    Interview with Linus Torvalds (on TF & Reason #3)

    View Slide

  22. Conclusion
    • Commits do not explain the TF of ~30% of the systems
    • Other important factors
    • Social roles
    • Testing and Quality Assurance
    • Pull requests and merges
    • Work on documentation
    • Work on related tools (e.g., plug ins)
    22

    View Slide

  23. Why Algorithms
    for Estimating
    Truck Factors
    Fail
    Thaís Mombach (UFMG),
    Mívian Ferreira (UFMG),
    Kecia Ferreira (CEFET-MG),
    Marco Tulio Valente (UFMG)

    View Slide