Slide 1

Slide 1 text

Lean GHTorrent Georgios Gousios, Bogdan Vasilescu, Alexander Serebrenik, Andy Zaidman @gousiosg

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

MSR ! 19 GB

Slide 6

Slide 6 text

MSR ! 19 GB VISSOFT! 0.5 GB

Slide 7

Slide 7 text

MSR ! 19 GB VISSOFT! 0.5 GB GHTorrent ! 3.5TB

Slide 8

Slide 8 text

MSR ! 19 GB VISSOFT! 0.5 GB GHTorrent ! 3.5TB Sun = 109x Earth! GHTorrent = 184x MSR

Slide 9

Slide 9 text

I need a fortune for H/W I need an army of researchers Replication?

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

There is a solution!

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

VS

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

@gousiosg http://ghtorrent.org/lean.html Lean GHTorrent: Github data on demand Georgios Gousios, Bogdan Vasilescu, Alexander Serebrenik and Andy Zaidman {g.gousios, a.e.zaidman}@tudelft.nl {b.n.vasilescu, a.serebrenik}@tue.nl Web server Web form 1 GHTorrent server 5 6 8 Job db Retrieval workers … Requests queue Responses queue 3 Dispatcher GHTorrent db GitHub API 2 Request listener Response listener 4 9 7 Requests db Software Engineering Research Group http://swerl.tudelft.nl/ Delft University of Technology Want to do research with GHTorrent data? It is now as easy as: 2. Getting the data! No need to care about this (but ask if you do!) 1. Filling in the form at ghtorrent.org/lean.html ( ( In the package, you will find: • A MySQL dump (to query like a boss) • MongoDB collection dumps (all Github API data) for all repos specified in step 1!