My adventures with open everything

43df3993acc9af4e9f619e59cd849aee?s=47 Georgios Gousios
September 20, 2017

My adventures with open everything

43df3993acc9af4e9f619e59cd849aee?s=128

Georgios Gousios

September 20, 2017
Tweet

Transcript

  1. My adventures with open .* Georgios Gousios TU Delft /

    EWI @gousiosg
  2. how I got trapped 1997 1999 first read about open

    source installed Linux 2001 wrote how to for the linux doc project contributor to the KDE 3.x/Kaffeine media player 2003 2006 leading the development of Alitheia Core 2008 google summer of code participant founding member, greek OSS society 2008 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project
  3. alitheia core 50k LOC!

  4. demo.sqo-oss.org

  5. alitheia core in numbers • 750 OSS repositories, 1.5GB data

    dump • the most refined software engineering dataset at the time • supported by an EC FP6 project • 6 partners • ~20 publications • 4 PhDs, mine included, funded
  6. 2 external publications 1 external user 0 industry adoption alitheia

    core impact
  7.   api.github.com/v3

  8. <<event>> PushEvent <<api>> /users/:user ensure_user <<api>> /repos/:user/:repo/ ensure_repo <<api>> /repos/:user/:repo/commits

    ensure_commits ensure_user <<api>> /:user/:repo/sha ensure_commit ensure_user <<api>> /users/:user/ followers ensure_followers <<api>> /repos/:user/:repo/ commits/:sha/comments ensure_commit_comments <<api>> /users/:user/orgs ensure_orgs <<api>> /orgs/:org/teams ensure_teams recursive dependency retrieval
  9. relational database

  10. repositories users organizations issues /users/:user /user/repos /repos/:user/:repo/issues /orgs/:org { "type":

    "User", "public_gists": 0, "login": "gousiosg", "followers": 8, "name": "Georgios Gousios", "public_repos": 4, "created_at": ..., "id": 386172, "following": 4, } { . . . noSQL database as cache
  11. periodic dumps of DBs online

  12. ghtorrent facts • 1 developer, no external funding • 3

    papers • advertised on social media • since 2012
  13. 300+ external users 150+ external papers msr14, vissoft14, github data

    mining challenge 40% of all papers on GitHub (Cosentino et al. 2016) many best paper awards used at: microsoft, delloite, blackduck received funding from: microsoft, google ghtorrent impact
  14. why such a difference? • github is hot as a

    research target! • true, but so was Sourceforge when Alitheia Core analysed it • (alitheia core was) not invented here! • true, but GHTorrent was of worse quality when available • i don’t want to invest time in your infrastructure! • true, but you still do it with GHTorrent (ok, less)
  15. be open or be irrelevant

  16. Tools Datasets

  17. what to open? at the very least: tools datasets but

    also: papers talk slides lecture notes technical designs (successful?) research proposals
  18. how to open? • choose a license • BSD or

    MIT for source code • CC-BY-SA for data and other materials • choose a platform • github for src • zenodo for data, gives a DOI! • slideshare or speakerdeck for slides • figshare, pure.tudelft.nl or your site for papers
  19. how to open? • choose a license • BSD or

    MIT for source code • CC-BY-SA for data and other materials • choose a platform • github for src • zenodo for data, gives a DOI! • slideshare or speakerdeck for slides • figshare, pure.tudelft.nl or your site for papers
  20. open now trumps open when it’s done

  21. None
  22. how to open now? • think in terms of Minimum

    Viable Product • what is the least possible amount of work that will make sense to somebody else? • work in iterations • open, gather feedback, improve, repeat • Embrace the “Hacker Way”
  23. but somebody will steal my data/code/ideas! it feels amazing to

    have created something worth stealing! • if someone invests time in stealing: • what you created is great • you have a head start • if nobody invests time in stealing: • is what you created worth your time/effort? • is your research relevant? good artists copy; great artists steal
  24. –Howard H. Aiken “The problem in this business isn't to

    keep people from stealing your ideas; it is making them steal your ideas” @gousiosg