Pro Yearly is on sale from $80 to $50! »

Understanding the Factors that Impact the Popularity of GitHub Repositories (ICSME 2016)

Understanding the Factors that Impact the Popularity of GitHub Repositories (ICSME 2016)

Software popularity is a valuable information to modern open source developers, who constantly want to know if their systems are attracting new users, if new releases are gaining acceptance, or if they are meeting user’s expectations. In this paper, we describe a study on the popularity of software systems hosted at GitHub, which is the world’s largest collection of open source software. GitHub provides an explicit way for users to manifest their satisfaction with a hosted repository: the stargazers button. In our study, we reveal the main factors that impact the number of stars of GitHub projects, including programming language and application domain. We also study the impact of new features on project popularity. Finally, we identify four main patterns of popularity growth, which are derived after clustering the time series representing the number of stars of 2,279 popular GitHub repositories. We hope our results provide valuable insights to developers and maintainers, which could help them on building and evolving systems in a competitive software market.

13beaa3b7239eca3319d54c6a9f3a85a?s=128

ASERG, DCC, UFMG

October 07, 2016
Tweet

Transcript

  1. Understanding the Factors that Impact the Popularity of GitHub Repositories

    Hudson Borges, Andre Hora, Marco Tulio Valente {hsborges, hora, mtov}@dcc.ufmg.br
  2. Introduction 15M users 36M repositories 2

  3. Social Coding Features 3 “Stars are used to show appreciation

    to the repository maintainer for their work”
  4. “Our First 50,000 Stars” 4 https://facebook.github.io/react/blog/2016/09/28/our-first-50000-stars.html

  5. Our Goal • Investigate the factors and the patterns that

    govern the popularity of GitHub repositories 5 Attractiveness Competitiveness
  6. Our work 1. Quantitative Analysis 2. Popularity growth patterns 3.

    Qualitative Study with Developers 6
  7. Data Collection • March, 2016 • Top-2,500 repositories ◦ Stars,

    creation date, language, etc. • Historical data on # stars • Six application domains ◦ Manual classification ◦ https://goo.gl/73Sbvz 7 Official GitHub API
  8. ( Some ) Results 8

  9. 9 Different distributions by applying Kruskal-Wallis Programming Language … 50+

    different languages
  10. 10 Different distributions by applying Kruskal-Wallis Application Domain

  11. 11 Different distributions by applying Mann-Whitney Repository Owner

  12. Correlation Analysis 12 Age No correlation Contributors Weak correlation Commits

    Weak correlation Forks Strong correlation
  13. Popularity Growth Patterns • K-Spectral Centroid clustering algorithm ◦ Clusters

    time series with similar shapes ◦ Invariant to scaling and shifting 13
  14. 14 Popularity Growth Patterns Slow Moderate Fast Viral

  15. Growth Patterns x Language 15 92.6% 4.5% 1.1% 1.6% 38.7%

    51.9% 7.6% 1.6% Slow Moderate Fast Viral
  16. Growth Patterns x Domain 16 75.5% 18.2% 4.9% 1.2% 51.9%

    33.5% 9.8% 4.6% Slow Moderate Fast Viral
  17. Developers Feedback A. Impact of account types (users vs organizations)

    B. Features implemented in successful releases C. Reasons for viral growth 17
  18. Repositories owned by organizations are more popular than the ones

    owned by individuals (RQ #1) Repositories Owned by Users 18 Top-100 Repositories 30 Users 17 Public emails 5 Responses
  19. 1. Do you plan to migrate to an organization account?

    ➢ All developers answered negatively 19 Repositories Owned by Users “I worked hard to create the project, and having it under my personal username is necessary to have proper credit for it.”
  20. Successful Releases 20 60 Releases with highest impact 25 Responses

    How the features were selected?
  21. Reasons for Viral Growth 21 22 Viral growth 14 Responses

    How do you explain the peaks in the number of stars?
  22. Reasons for Viral Growth 22 22 Viral growth 14 Responses

    “I posted about this project on HackerNews. It quickly got a lot of attention and remained on the front page of HackerNews for over 24 hours ...” How do you explain the peaks in the number of stars?
  23. Conclusion ➢ Domains/languages/owners may impact popularity ➢ Strong correlation between

    popularity and forks ➢ Repositories receive more stars right after creation and after releases Factors and Properties 23
  24. Conclusion ➢ Slow, Moderate, Fast, and Viral ➢ Slow is

    the most common pattern ➢ Viral is the less common pattern Popularity Growth Patterns 24
  25. Conclusion ➢ Developers look for recognition ➢ Popular features are

    identified by developers themselves ➢ Peaks of popularity are triggered by social media posts Qualitative study 25
  26. Future Work ➢ Popular vs unpopular repositories ➢ Causality analysis

    ➢ Popularity prediction 26
  27. Tool Support http://gittrends.io 27

  28. Thank you! 28