Slide 1

Slide 1 text

GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University of Technology (TU Delft) Kevin Lewis – Microsoft

Slide 2

Slide 2 text

Snapshot overview

Slide 3

Slide 3 text

Inspire confidence

Slide 4

Slide 4 text

How open is a project? http://ghtorrent.org/pullreq-perf/

Slide 5

Slide 5 text

Commits (core vs community)

Slide 6

Slide 6 text

Commits (origin)

Slide 7

Slide 7 text

Comments (core vs community)

Slide 8

Slide 8 text

PR lifelines

Slide 9

Slide 9 text

Are we using git in a distributed way?

Slide 10

Slide 10 text

How may devs are there per country?

Slide 11

Slide 11 text

Insights

Slide 12

Slide 12 text

Business insights

Slide 13

Slide 13 text

Research insights

Slide 14

Slide 14 text

Cross-domain insights

Slide 15

Slide 15 text

Operational insights

Slide 16

Slide 16 text

Approach Data for the masses

Slide 17

Slide 17 text

GitHub by the numbers (Mid 2016)

Slide 18

Slide 18 text

Approach http://ghtorrent.org

Slide 19

Slide 19 text

How does it work? http://api.github.com/events

Slide 20

Slide 20 text

Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS

Slide 21

Slide 21 text

Entities

Slide 22

Slide 22 text

GHTorrent architecture Github API Event Retrieval Commits Queue Project Events Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster

Slide 23

Slide 23 text

GHTorrent by the numbers

Slide 24

Slide 24 text

Using the data You can do it too!

Slide 25

Slide 25 text

Using the data: Hosted http://ghtorrent.org

Slide 26

Slide 26 text

Using the data: Download

Slide 27

Slide 27 text

Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook

Slide 28

Slide 28 text

Using the data: Azure Data Lake

Slide 29

Slide 29 text

Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis