Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Understanding the Factors that Impact the Popularity of GitHub Repositories (ICSME 2016)

Understanding the Factors that Impact the Popularity of GitHub Repositories (ICSME 2016)

Software popularity is a valuable information to modern open source developers, who constantly want to know if their systems are attracting new users, if new releases are gaining acceptance, or if they are meeting user’s expectations. In this paper, we describe a study on the popularity of software systems hosted at GitHub, which is the world’s largest collection of open source software. GitHub provides an explicit way for users to manifest their satisfaction with a hosted repository: the stargazers button. In our study, we reveal the main factors that impact the number of stars of GitHub projects, including programming language and application domain. We also study the impact of new features on project popularity. Finally, we identify four main patterns of popularity growth, which are derived after clustering the time series representing the number of stars of 2,279 popular GitHub repositories. We hope our results provide valuable insights to developers and maintainers, which could help them on building and evolving systems in a competitive software market.

ASERG, DCC, UFMG

October 07, 2016
Tweet

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Transcript

  1. Understanding the Factors
    that Impact the Popularity of
    GitHub Repositories
    Hudson Borges, Andre Hora, Marco Tulio Valente
    {hsborges, hora, mtov}@dcc.ufmg.br

    View Slide

  2. Introduction
    15M users 36M repositories
    2

    View Slide

  3. Social Coding Features
    3
    “Stars are used to show appreciation to the repository maintainer for their work”

    View Slide

  4. “Our First 50,000 Stars”
    4
    https://facebook.github.io/react/blog/2016/09/28/our-first-50000-stars.html

    View Slide

  5. Our Goal
    ● Investigate the factors and the patterns that govern the
    popularity of GitHub repositories
    5
    Attractiveness Competitiveness

    View Slide

  6. Our work
    1. Quantitative Analysis
    2. Popularity growth patterns
    3. Qualitative Study with Developers
    6

    View Slide

  7. Data Collection
    ● March, 2016
    ● Top-2,500 repositories
    ○ Stars, creation date, language, etc.
    ● Historical data on # stars
    ● Six application domains
    ○ Manual classification
    ○ https://goo.gl/73Sbvz
    7
    Official GitHub API

    View Slide

  8. ( Some ) Results
    8

    View Slide

  9. 9
    Different distributions
    by applying
    Kruskal-Wallis
    Programming Language
    … 50+ different languages

    View Slide

  10. 10
    Different distributions
    by applying
    Kruskal-Wallis
    Application Domain

    View Slide

  11. 11
    Different distributions
    by applying
    Mann-Whitney
    Repository Owner

    View Slide

  12. Correlation Analysis
    12
    Age
    No correlation
    Contributors
    Weak correlation
    Commits
    Weak correlation
    Forks
    Strong correlation

    View Slide

  13. Popularity Growth Patterns
    ● K-Spectral Centroid clustering algorithm
    ○ Clusters time series with similar shapes
    ○ Invariant to scaling and shifting
    13

    View Slide

  14. 14
    Popularity Growth Patterns
    Slow Moderate
    Fast Viral

    View Slide

  15. Growth Patterns x Language
    15
    92.6%
    4.5%
    1.1%
    1.6%
    38.7%
    51.9%
    7.6%
    1.6%
    Slow
    Moderate
    Fast
    Viral

    View Slide

  16. Growth Patterns x Domain
    16
    75.5%
    18.2%
    4.9%
    1.2%
    51.9%
    33.5%
    9.8%
    4.6%
    Slow
    Moderate
    Fast
    Viral

    View Slide

  17. Developers Feedback
    A. Impact of account types (users vs organizations)
    B. Features implemented in successful releases
    C. Reasons for viral growth
    17

    View Slide

  18. Repositories owned by organizations are more popular than
    the ones owned by individuals (RQ #1)
    Repositories Owned by Users
    18
    Top-100
    Repositories
    30
    Users
    17
    Public emails
    5
    Responses

    View Slide

  19. 1. Do you plan to migrate to an organization account?
    ➢ All developers answered negatively
    19
    Repositories Owned by Users
    “I worked hard to create the project, and having it
    under my personal username is necessary to
    have proper credit for it.”

    View Slide

  20. Successful Releases
    20
    60
    Releases with highest
    impact
    25
    Responses
    How the features were
    selected?

    View Slide

  21. Reasons for Viral Growth
    21
    22
    Viral growth
    14
    Responses
    How do you explain the peaks
    in the number of stars?

    View Slide

  22. Reasons for Viral Growth
    22
    22
    Viral growth
    14
    Responses
    “I posted about this project on
    HackerNews. It quickly got a lot of
    attention and remained on the front page
    of HackerNews for over 24 hours ...”
    How do you explain the peaks
    in the number of stars?

    View Slide

  23. Conclusion
    ➢ Domains/languages/owners may impact popularity
    ➢ Strong correlation between popularity and forks
    ➢ Repositories receive more stars right after creation
    and after releases
    Factors and Properties
    23

    View Slide

  24. Conclusion
    ➢ Slow, Moderate, Fast, and Viral
    ➢ Slow is the most common pattern
    ➢ Viral is the less common pattern
    Popularity Growth Patterns
    24

    View Slide

  25. Conclusion
    ➢ Developers look for recognition
    ➢ Popular features are identified by developers
    themselves
    ➢ Peaks of popularity are triggered by social
    media posts
    Qualitative study
    25

    View Slide

  26. Future Work
    ➢ Popular vs unpopular repositories
    ➢ Causality analysis
    ➢ Popularity prediction
    26

    View Slide

  27. Tool Support
    http://gittrends.io
    27

    View Slide

  28. Thank you!
    28

    View Slide