Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GitHub Insights: Understanding Open Source

GitHub Insights: Understanding Open Source

Talk given at OSCON 2016

Georgios Gousios

May 19, 2016
Tweet

More Decks by Georgios Gousios

Other Decks in Technology

Transcript

  1. GitHub Insights
    Understanding Open Source
    @jeffmcaffer–Microsoft
    Georgios Gousios –Delft University of Technology (TU Delft)
    Kevin Lewis – Microsoft

    View Slide

  2. Snapshot overview

    View Slide

  3. Inspire confidence

    View Slide

  4. How open is a project?
    http://ghtorrent.org/pullreq-perf/

    View Slide

  5. Commits (core vs community)

    View Slide

  6. Commits (origin)

    View Slide

  7. Comments (core vs community)

    View Slide

  8. PR lifelines

    View Slide

  9. Are we using git in a distributed way?

    View Slide

  10. How may devs are there per country?

    View Slide

  11. Insights

    View Slide

  12. Business insights

    View Slide

  13. Research insights

    View Slide

  14. Cross-domain insights

    View Slide

  15. Operational insights

    View Slide

  16. Approach
    Data for the masses

    View Slide

  17. GitHub by the numbers (Mid 2016)

    View Slide

  18. Approach
    http://ghtorrent.org

    View Slide

  19. How does it work?
    http://api.github.com/events

    View Slide

  20. Example event (condensed)
    https://api.github.com/users/Cephei
    https://api.github.com/repos/PowerDMS/Owin.Scim
    https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3
    https://api.github.com/orgs/PowerDMS

    View Slide

  21. Entities

    View Slide

  22. GHTorrent architecture
    Github API
    Event
    Retrieval
    Commits Queue
    Project Events
    Queue
    Events
    Data
    Retrieval
    Projects Commits
    evt.commit
    evt.watch
    evt.fork
    Data
    Retrieval
    Data
    Retrieval
    Data
    Retrieval
    Mirroring
    Cluster

    View Slide

  23. GHTorrent by the numbers

    View Slide

  24. Using the data
    You can do it too!

    View Slide

  25. Using the data: Hosted
    http://ghtorrent.org

    View Slide

  26. Using the data: Download

    View Slide

  27. Using the data: Self-service
    https://github.com/ghtorrent/ghtorrent-webhook

    View Slide

  28. Using the data: Azure Data Lake

    View Slide

  29. Resources
    http://ghtorrent.org
    https://github.com/Microsoft/ghinsights
    @gousiosg @jeffmcaffer @kelewis

    View Slide