Upgrade to Pro — share decks privately, control downloads, hide ads and more …

My adventures with open everything

Georgios Gousios
September 20, 2017

My adventures with open everything

Georgios Gousios

September 20, 2017
Tweet

More Decks by Georgios Gousios

Other Decks in Research

Transcript

  1. My adventures with
    open .*
    Georgios Gousios

    TU Delft / EWI
    @gousiosg

    View Slide

  2. how I got trapped
    1997
    1999
    first read about open source
    installed Linux
    2001 wrote how to for the linux doc project
    contributor to the KDE 3.x/Kaffeine media player
    2003
    2006 leading the development of Alitheia Core
    2008 google summer of code participant
    founding member, greek OSS society
    2008
    2010 work on OSS cloud infrastructures
    2011 started the GHTorrent project

    View Slide

  3. alitheia core
    50k LOC!

    View Slide

  4. demo.sqo-oss.org

    View Slide

  5. alitheia core in numbers
    • 750 OSS repositories, 1.5GB data dump

    • the most refined software engineering dataset at the
    time

    • supported by an EC FP6 project

    • 6 partners

    • ~20 publications

    • 4 PhDs, mine included, funded

    View Slide

  6. 2 external publications
    1 external user
    0 industry adoption
    alitheia core impact

    View Slide



  7. api.github.com/v3

    View Slide

  8. <>
    PushEvent
    <>
    /users/:user
    ensure_user
    <>
    /repos/:user/:repo/
    ensure_repo
    <>
    /repos/:user/:repo/commits
    ensure_commits
    ensure_user
    <>
    /:user/:repo/sha
    ensure_commit
    ensure_user
    <>
    /users/:user/
    followers
    ensure_followers
    <>
    /repos/:user/:repo/
    commits/:sha/comments
    ensure_commit_comments
    <>
    /users/:user/orgs
    ensure_orgs
    <>
    /orgs/:org/teams
    ensure_teams
    recursive dependency retrieval

    View Slide

  9. relational database

    View Slide

  10. repositories
    users
    organizations
    issues
    /users/:user
    /user/repos
    /repos/:user/:repo/issues
    /orgs/:org
    {
    "type": "User",
    "public_gists": 0,
    "login": "gousiosg",
    "followers": 8,
    "name": "Georgios Gousios",
    "public_repos": 4,
    "created_at": ...,
    "id": 386172,
    "following": 4,
    }
    {
    .
    .
    .
    noSQL database as cache

    View Slide

  11. periodic dumps of DBs online

    View Slide

  12. ghtorrent facts
    • 1 developer, no external funding

    • 3 papers

    • advertised on social media

    • since 2012

    View Slide

  13. 300+ external users
    150+ external papers
    msr14, vissoft14, github data mining challenge

    40% of all papers on GitHub (Cosentino et al. 2016)


    many best paper awards

    used at: microsoft, delloite, blackduck

    received funding from: microsoft, google

    ghtorrent impact

    View Slide

  14. why such a difference?
    • github is hot as a research target!

    • true, but so was Sourceforge when Alitheia Core
    analysed it

    • (alitheia core was) not invented here!

    • true, but GHTorrent was of worse quality when available

    • i don’t want to invest time in your infrastructure!

    • true, but you still do it with GHTorrent (ok, less)

    View Slide

  15. be open
    or
    be irrelevant

    View Slide

  16. Tools Datasets

    View Slide

  17. what to open?
    at the very least:

    tools

    datasets

    but also:

    papers

    talk slides

    lecture notes

    technical designs

    (successful?) research proposals

    View Slide

  18. how to open?
    • choose a license

    • BSD or MIT for source code

    • CC-BY-SA for data and other materials

    • choose a platform

    • github for src

    • zenodo for data, gives a DOI!

    • slideshare or speakerdeck for slides

    • figshare, pure.tudelft.nl or your site for papers

    View Slide

  19. how to open?
    • choose a license

    • BSD or MIT for source code

    • CC-BY-SA for data and other materials

    • choose a platform

    • github for src

    • zenodo for data, gives a DOI!

    • slideshare or speakerdeck for slides

    • figshare, pure.tudelft.nl or your site for papers

    View Slide

  20. open now
    trumps
    open when it’s done

    View Slide

  21. View Slide

  22. how to open now?
    • think in terms of Minimum Viable Product

    • what is the least possible amount of work that will
    make sense to somebody else?

    • work in iterations

    • open, gather feedback, improve, repeat

    • Embrace the “Hacker Way”

    View Slide

  23. but somebody will steal my
    data/code/ideas!
    it feels amazing to have created something worth stealing!
    • if someone invests time in stealing:

    • what you created is great

    • you have a head start

    • if nobody invests time in stealing:

    • is what you created worth your time/effort?

    • is your research relevant?
    good artists copy; great artists steal

    View Slide

  24. –Howard H. Aiken
    “The problem in this business isn't to keep people
    from stealing your ideas; it is making them steal
    your ideas”
    @gousiosg

    View Slide