Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Understanding the Factors that Impact the Popularity of GitHub Repositories (ICSME 2016)

Understanding the Factors that Impact the Popularity of GitHub Repositories (ICSME 2016)

Software popularity is a valuable information to modern open source developers, who constantly want to know if their systems are attracting new users, if new releases are gaining acceptance, or if they are meeting user’s expectations. In this paper, we describe a study on the popularity of software systems hosted at GitHub, which is the world’s largest collection of open source software. GitHub provides an explicit way for users to manifest their satisfaction with a hosted repository: the stargazers button. In our study, we reveal the main factors that impact the number of stars of GitHub projects, including programming language and application domain. We also study the impact of new features on project popularity. Finally, we identify four main patterns of popularity growth, which are derived after clustering the time series representing the number of stars of 2,279 popular GitHub repositories. We hope our results provide valuable insights to developers and maintainers, which could help them on building and evolving systems in a competitive software market.

ASERG, DCC, UFMG

October 07, 2016
Tweet

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Transcript

  1. Understanding the Factors that Impact the Popularity of GitHub Repositories

    Hudson Borges, Andre Hora, Marco Tulio Valente {hsborges, hora, mtov}@dcc.ufmg.br
  2. Social Coding Features 3 “Stars are used to show appreciation

    to the repository maintainer for their work”
  3. Our Goal • Investigate the factors and the patterns that

    govern the popularity of GitHub repositories 5 Attractiveness Competitiveness
  4. Data Collection • March, 2016 • Top-2,500 repositories ◦ Stars,

    creation date, language, etc. • Historical data on # stars • Six application domains ◦ Manual classification ◦ https://goo.gl/73Sbvz 7 Official GitHub API
  5. Popularity Growth Patterns • K-Spectral Centroid clustering algorithm ◦ Clusters

    time series with similar shapes ◦ Invariant to scaling and shifting 13
  6. Growth Patterns x Language 15 92.6% 4.5% 1.1% 1.6% 38.7%

    51.9% 7.6% 1.6% Slow Moderate Fast Viral
  7. Growth Patterns x Domain 16 75.5% 18.2% 4.9% 1.2% 51.9%

    33.5% 9.8% 4.6% Slow Moderate Fast Viral
  8. Developers Feedback A. Impact of account types (users vs organizations)

    B. Features implemented in successful releases C. Reasons for viral growth 17
  9. Repositories owned by organizations are more popular than the ones

    owned by individuals (RQ #1) Repositories Owned by Users 18 Top-100 Repositories 30 Users 17 Public emails 5 Responses
  10. 1. Do you plan to migrate to an organization account?

    ➢ All developers answered negatively 19 Repositories Owned by Users “I worked hard to create the project, and having it under my personal username is necessary to have proper credit for it.”
  11. Reasons for Viral Growth 21 22 Viral growth 14 Responses

    How do you explain the peaks in the number of stars?
  12. Reasons for Viral Growth 22 22 Viral growth 14 Responses

    “I posted about this project on HackerNews. It quickly got a lot of attention and remained on the front page of HackerNews for over 24 hours ...” How do you explain the peaks in the number of stars?
  13. Conclusion ➢ Domains/languages/owners may impact popularity ➢ Strong correlation between

    popularity and forks ➢ Repositories receive more stars right after creation and after releases Factors and Properties 23
  14. Conclusion ➢ Slow, Moderate, Fast, and Viral ➢ Slow is

    the most common pattern ➢ Viral is the less common pattern Popularity Growth Patterns 24
  15. Conclusion ➢ Developers look for recognition ➢ Popular features are

    identified by developers themselves ➢ Peaks of popularity are triggered by social media posts Qualitative study 25