Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Why Algorithms for Estimating Truck Factors Fail? (VEM Workshop, 2018)

Why Algorithms for Estimating Truck Factors Fail? (VEM Workshop, 2018)

In this paper, we present a set of reasons for the incorrect results presented by algorithms for estimating truck (or bus) factors, which are usually based on the analysis of commits. We report that commits do not explain the TF of 10 systems in a dataset of 33 systems with well-known TF results.

ASERG, DCC, UFMG

September 20, 2017
Tweet

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Transcript

  1. Why Algorithms
    for Estimating
    Truck Factors
    Fail
    Thaís Mombach (UFMG)
    Mívian Ferreira (UFMG)
    Kecia Ferreira (CEFET-MG)
    Marco Tulio Valente (UFMG)

    View full-size slide

  2. Truck (or Bus) Factor
    2
    The minimum number of developers that if hit by a truck
    (or bus) will put a project in a serious risk

    View full-size slide

  3. TF reveals the concentration of knowledge in software
    projects, both open source and commercial ones
    3

    View full-size slide

  4. Work with the highest impact of our group
    among practitioners
    4

    View full-size slide

  5. PeerJ 2015 (preprint, targeting practitioners)
    5
    https://peerj.com/preprints/1233

    View full-size slide

  6. Remembered until today ….
    6
    "Oh, nice to hear from you! I heard a lot about (and read)
    your group's truck factor paper. Cool work!"
    (answer received recently, in another survey, not related with TF)

    View full-size slide

  7. ICPC 2016: Algorithm for Estimating TFs
    7
    https://arxiv.org/abs/1604.06766

    View full-size slide

  8. Tool Support
    8
    https://github.com/aserg-ufmg/Truck-Factor

    View full-size slide

  9. ICPC 2017: Comparative Study
    9

    View full-size slide

  10. ICPC 2017: Comparative Study
    10
    A Comparison of Three Algorithms for Computing Truck Factors, ICPC 2017

    View full-size slide

  11. Why TF algorithms fail
    11

    View full-size slide

  12. Oracle
    • 33 systems; reused from previous work
    • Built by surveying developers
    12

    View full-size slide

  13. When do commits explain TF results?
    • Suppose:
    • TF: set with devs responsible for the TF of a system
    • size(TF)=n
    • C= set with the top-n devs with most commits
    • if TF == C
    • then commits explain the system’s TF
    13

    View full-size slide

  14. 14
    TF # systems where commits
    explain the TF results do not explain the TF results
    1 17 2
    2 4 1
    3 2 1
    4 0 3
    5 0 1
    11 0 1
    15 0 1
    TOTAL 23 10

    View full-size slide

  15. Commits do not explain the TF of 10 systems
    (out of 33 systems)
    15

    View full-size slide

  16. How to explain the TF of these systems?
    • Survey with 20 devs, from the 10 systems
    • 17 e-mails, 7 answers
    16

    View full-size slide

  17. Reason #1: Social roles (4 answers)
    17
    "I was one of the more vocal people on IRC, mailing lists
    and other channels ..."
    "Going at conferences, meetup to evangelize the project,
    and talk about it"

    View full-size slide

  18. Reason #2: Testing & Quality Assurance (2 answers)
    18
    "Testing the project and filling bug reports"

    View full-size slide

  19. Role #3: Pull Requests and Merges (2 answers)
    19
    "Consider other measures: number of pull requests
    merged"

    View full-size slide

  20. Interview with Linus Torvalds (on TF & Reason #3)
    20
    https://www.bloomberg.com/news/articles/2015-06-16/the-creator-of-linux-on-the-future-without-him

    View full-size slide

  21. 21
    "There is no concrete plan of action if I die.
    But that would have been a bigger deal 10 or 15 years ago."
    ….
    "[Today] most of the code I get is written by tens of people"
    Interview with Linus Torvalds (on TF & Reason #3)

    View full-size slide

  22. Conclusion
    • Commits do not explain the TF of ~30% of the systems
    • Other important factors
    • Social roles
    • Testing and Quality Assurance
    • Pull requests and merges
    • Work on documentation
    • Work on related tools (e.g., plug ins)
    22

    View full-size slide

  23. Why Algorithms
    for Estimating
    Truck Factors
    Fail
    Thaís Mombach (UFMG),
    Mívian Ferreira (UFMG),
    Kecia Ferreira (CEFET-MG),
    Marco Tulio Valente (UFMG)

    View full-size slide