Upgrade to Pro — share decks privately, control downloads, hide ads and more …

My adventures with open everything

Georgios Gousios
September 20, 2017

My adventures with open everything

Georgios Gousios

September 20, 2017
Tweet

More Decks by Georgios Gousios

Other Decks in Research

Transcript

  1. how I got trapped 1997 1999 first read about open

    source installed Linux 2001 wrote how to for the linux doc project contributor to the KDE 3.x/Kaffeine media player 2003 2006 leading the development of Alitheia Core 2008 google summer of code participant founding member, greek OSS society 2008 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project
  2. alitheia core in numbers • 750 OSS repositories, 1.5GB data

    dump • the most refined software engineering dataset at the time • supported by an EC FP6 project • 6 partners • ~20 publications • 4 PhDs, mine included, funded
  3. <<event>> PushEvent <<api>> /users/:user ensure_user <<api>> /repos/:user/:repo/ ensure_repo <<api>> /repos/:user/:repo/commits

    ensure_commits ensure_user <<api>> /:user/:repo/sha ensure_commit ensure_user <<api>> /users/:user/ followers ensure_followers <<api>> /repos/:user/:repo/ commits/:sha/comments ensure_commit_comments <<api>> /users/:user/orgs ensure_orgs <<api>> /orgs/:org/teams ensure_teams recursive dependency retrieval
  4. repositories users organizations issues /users/:user /user/repos /repos/:user/:repo/issues /orgs/:org { "type":

    "User", "public_gists": 0, "login": "gousiosg", "followers": 8, "name": "Georgios Gousios", "public_repos": 4, "created_at": ..., "id": 386172, "following": 4, } { . . . noSQL database as cache
  5. ghtorrent facts • 1 developer, no external funding • 3

    papers • advertised on social media • since 2012
  6. 300+ external users 150+ external papers msr14, vissoft14, github data

    mining challenge 40% of all papers on GitHub (Cosentino et al. 2016) many best paper awards used at: microsoft, delloite, blackduck received funding from: microsoft, google ghtorrent impact
  7. why such a difference? • github is hot as a

    research target! • true, but so was Sourceforge when Alitheia Core analysed it • (alitheia core was) not invented here! • true, but GHTorrent was of worse quality when available • i don’t want to invest time in your infrastructure! • true, but you still do it with GHTorrent (ok, less)
  8. what to open? at the very least: tools datasets but

    also: papers talk slides lecture notes technical designs (successful?) research proposals
  9. how to open? • choose a license • BSD or

    MIT for source code • CC-BY-SA for data and other materials • choose a platform • github for src • zenodo for data, gives a DOI! • slideshare or speakerdeck for slides • figshare, pure.tudelft.nl or your site for papers
  10. how to open? • choose a license • BSD or

    MIT for source code • CC-BY-SA for data and other materials • choose a platform • github for src • zenodo for data, gives a DOI! • slideshare or speakerdeck for slides • figshare, pure.tudelft.nl or your site for papers
  11. how to open now? • think in terms of Minimum

    Viable Product • what is the least possible amount of work that will make sense to somebody else? • work in iterations • open, gather feedback, improve, repeat • Embrace the “Hacker Way”
  12. but somebody will steal my data/code/ideas! it feels amazing to

    have created something worth stealing! • if someone invests time in stealing: • what you created is great • you have a head start • if nobody invests time in stealing: • is what you created worth your time/effort? • is your research relevant? good artists copy; great artists steal
  13. –Howard H. Aiken “The problem in this business isn't to

    keep people from stealing your ideas; it is making them steal your ideas” @gousiosg