Slide 1

Slide 1 text

My adventures with open .* Georgios Gousios TU Delft / EWI @gousiosg

Slide 2

Slide 2 text

how I got trapped 1997 1999 first read about open source installed Linux 2001 wrote how to for the linux doc project contributor to the KDE 3.x/Kaffeine media player 2003 2006 leading the development of Alitheia Core 2008 google summer of code participant founding member, greek OSS society 2008 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project

Slide 3

Slide 3 text

alitheia core 50k LOC!

Slide 4

Slide 4 text

demo.sqo-oss.org

Slide 5

Slide 5 text

alitheia core in numbers • 750 OSS repositories, 1.5GB data dump • the most refined software engineering dataset at the time • supported by an EC FP6 project • 6 partners • ~20 publications • 4 PhDs, mine included, funded

Slide 6

Slide 6 text

2 external publications 1 external user 0 industry adoption alitheia core impact

Slide 7

Slide 7 text

  api.github.com/v3

Slide 8

Slide 8 text

<> PushEvent <> /users/:user ensure_user <> /repos/:user/:repo/ ensure_repo <> /repos/:user/:repo/commits ensure_commits ensure_user <> /:user/:repo/sha ensure_commit ensure_user <> /users/:user/ followers ensure_followers <> /repos/:user/:repo/ commits/:sha/comments ensure_commit_comments <> /users/:user/orgs ensure_orgs <> /orgs/:org/teams ensure_teams recursive dependency retrieval

Slide 9

Slide 9 text

relational database

Slide 10

Slide 10 text

repositories users organizations issues /users/:user /user/repos /repos/:user/:repo/issues /orgs/:org { "type": "User", "public_gists": 0, "login": "gousiosg", "followers": 8, "name": "Georgios Gousios", "public_repos": 4, "created_at": ..., "id": 386172, "following": 4, } { . . . noSQL database as cache

Slide 11

Slide 11 text

periodic dumps of DBs online

Slide 12

Slide 12 text

ghtorrent facts • 1 developer, no external funding • 3 papers • advertised on social media • since 2012

Slide 13

Slide 13 text

300+ external users 150+ external papers msr14, vissoft14, github data mining challenge 40% of all papers on GitHub (Cosentino et al. 2016) many best paper awards used at: microsoft, delloite, blackduck received funding from: microsoft, google ghtorrent impact

Slide 14

Slide 14 text

why such a difference? • github is hot as a research target! • true, but so was Sourceforge when Alitheia Core analysed it • (alitheia core was) not invented here! • true, but GHTorrent was of worse quality when available • i don’t want to invest time in your infrastructure! • true, but you still do it with GHTorrent (ok, less)

Slide 15

Slide 15 text

be open or be irrelevant

Slide 16

Slide 16 text

Tools Datasets

Slide 17

Slide 17 text

what to open? at the very least: tools datasets but also: papers talk slides lecture notes technical designs (successful?) research proposals

Slide 18

Slide 18 text

how to open? • choose a license • BSD or MIT for source code • CC-BY-SA for data and other materials • choose a platform • github for src • zenodo for data, gives a DOI! • slideshare or speakerdeck for slides • figshare, pure.tudelft.nl or your site for papers

Slide 19

Slide 19 text

how to open? • choose a license • BSD or MIT for source code • CC-BY-SA for data and other materials • choose a platform • github for src • zenodo for data, gives a DOI! • slideshare or speakerdeck for slides • figshare, pure.tudelft.nl or your site for papers

Slide 20

Slide 20 text

open now trumps open when it’s done

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

how to open now? • think in terms of Minimum Viable Product • what is the least possible amount of work that will make sense to somebody else? • work in iterations • open, gather feedback, improve, repeat • Embrace the “Hacker Way”

Slide 23

Slide 23 text

but somebody will steal my data/code/ideas! it feels amazing to have created something worth stealing! • if someone invests time in stealing: • what you created is great • you have a head start • if nobody invests time in stealing: • is what you created worth your time/effort? • is your research relevant? good artists copy; great artists steal

Slide 24

Slide 24 text

–Howard H. Aiken “The problem in this business isn't to keep people from stealing your ideas; it is making them steal your ideas” @gousiosg