$30 off During Our Annual Pro Sale. View Details »

Open source licensing by the numbers

Open source licensing by the numbers

A look at how GitHub users license their code (or in all practicality, don’t)

Ben Balter

July 22, 2015
Tweet

More Decks by Ben Balter

Other Decks in Technology

Transcript

  1. !
    Open source licensing by the numbers
    A look at how GitHub users license their code (or in all practicality, don’t)
    @BenBalter
    [email protected]

    View Slide

  2. !
    @BenBalter
    " Gov. Evangelist @ GitHub
    $ Attorney (but IANYL)
    % Open source developer

    View Slide

  3. Roadmap
    1. Why do we care about open source licensing?
    2. How the GitHub licensing API works
    3. The numbers

    View Slide

  4. !
    1. Why open source licensing matters

    View Slide

  5. !
    Open source ≠ published source

    View Slide

  6. !
    With out an open source license, 

    your code isn’t open source

    View Slide

  7. !
    Open Source (software)

    software that can be freely used, modified, and shared (in both
    modified and unmodified form) by anyone
    help.github.com/articles/github-glossary/#open-source

    View Slide

  8. !
    Open Source

    a philosophy of collaboration in which
    working materials are made available online
    for anyone to fork, modify, discuss, and contribute to.

    View Slide

  9. !
    Licensing moves software 

    from published to open

    View Slide

  10. !
    All open source licenses 

    contain three things

    View Slide

  11. !
    1. Copyright license grant
    2. Disclaimer against liability
    3. Attribution requirements

    View Slide

  12. !
    20% of “open source” projects on
    GitHub have an open source license

    View Slide

  13. !
    Cue scary music

    View Slide

  14. !
    That’s actually awesome

    View Slide

  15. View Slide

  16. !
    Pareto Principle - 80% of the
    effects come from 20% of the causes

    View Slide

  17. Ruby on Rails, Bootstrap
    Your CS homework, that weekend hack project

    View Slide

  18. !
    Why don’t people license their code?

    View Slide

  19. ‣ There’s a bajillion different options
    ‣ Every discussion results in a holy war
    ‣ Open source licensing isn’t taught in law school
    ‣ Devs today grew up in a world in which open source has won

    View Slide

  20. !
    We can’t leave open source licensing
    in the hands of large corporations

    View Slide

  21. !
    You shouldn’t need to hire a lawyer
    to contribute to open source

    View Slide

  22. choosealicense.com

    View Slide

  23. !
    2. GitHub’s licensing API

    View Slide

  24. !
    How do you identify a project’s license?

    View Slide

  25. !
    What’s necessary to license a project?
    ‣ LICENSE file with the full text of the license
    ‣ LICENSE file with the license name or abbreviation
    ‣ README which links to the full license text
    ‣ README which references the license
    ‣ Human readable references within a file
    ‣ Machine-readable package manager config file
    Halp? github.com/licensee/issues/4

    View Slide

  26. !
    github.com/benbalter/licensee

    View Slide

  27. !
    Licensee has 4 matching “strategies”
    1. Copyright matcher
    2. Exact matcher
    3. Git matcher
    4. Levenshtein matcher

    View Slide

  28. license = Licensee.license "/path/to/a/project"
    => #
    license.key
    => "mit"
    license.name
    => "MIT License"
    license.meta["source"]
    => "http://opensource.org/licenses/MIT"
    license.meta["description"]
    => "A permissive license that is short and to the point. It lets people
    do anything with your code with proper attribution and without
    warranty."
    license.meta["permitted"]
    => ["modifications","distribution","sublicense","private-use"]
    Ruby

    View Slide

  29. $ licensee ~/projects/licensee
    License file: LICENSE.md
    License: MIT License
    Confidence: 94%
    Method: Licensee::GitMatcher
    Command line

    View Slide

  30. !
    Via the GitHub API

    View Slide

  31. [
    {
    "key": "agpl-3.0",
    "name": "GNU Affero General Public License v3.0",
    "url": "https://api.github.com/licenses/agpl-3.0",
    "featured": false
    },
    {
    "key": "apache-2.0",
    "name": "Apache License 2.0",
    "url": "https://api.github.com/licenses/apache-2.0",
    "featured": true
    },
    ...
    $ curl -H 'Accept: application/vnd.github.drax-preview+json' \
    https://api.github.com/licenses

    View Slide

  32. $ curl -H 'Accept: application/vnd.github.drax-preview+json' \
    https://api.github.com/licenses/mit
    {
    "key": "mit",
    "name": "MIT License",
    "url": "https://api.github.com/licenses/mit",
    "featured": true,
    "html_url": "http://choosealicense.com/licenses/mit/",
    "description": "A permissive license that is short and to the point. It lets people do anything with
    your code with proper attribution and without warranty.",
    "category": "MIT",
    "implementation": "Create a text file (typically named LICENSE or LICENSE.txt) in the root of your
    source code and copy the text of the license into the file. Replace [year] with the current year and
    [fullname] with the name (or names) of the copyright holders.",
    "required": [
    "include-copyright"
    ],
    "permitted": [
    "commercial-use",
    "modifications",
    "distribution",

    View Slide

  33. {
    "id": 12325212,
    "name": "gman",
    "full_name": “benbalter/gman",
    ...
    "license": {
    "key": "mit",
    "name": "MIT License",
    "url": "https://api.github.com/licenses/mit",
    "featured": true
    },
    ...
    "network_count": 38,
    "subscribers_count": 5
    }
    $ curl -H 'Accept: application/vnd.github.drax-preview+json' \
    https://api.github.com/repos/benbalter/gman

    View Slide

  34. !
    Audit your org’s 

    open source license usage
    $ curl -H 'Accept: application/vnd.github.drax-preview+json' \
    https://api.github.com/orgs/github/repos

    View Slide

  35. 5 apache-2.0
    1 bsd-3-clause
    2 cc0-1.0
    1 gpl-2.0
    56 mit
    12 other
    curl -s -H 'Accept: application/vnd.github.drax-preview+json' \
    ‘https://api.github.com/orgs/github/repos?per_page=100' | \
    grep -A1 '"license"' | grep '"key"' | cut -d'"' -f4 | \
    sort | uniq -c
    h/t @mislav

    View Slide

  36. #!/bin/bash
    set -e
    { ruby -rbundler -e 'puts Bundler.load.specs.map(&:gem_dir)'
    ls -d node_modules/* bower_components/*
    } | while read dir; do
    echo -n "${dir##*/}: "
    licensee "$dir" | grep 'License:\|Unknown' | sed 's/License: //'
    done
    Audit all project dependencies
    h/t @mislav

    View Slide

  37. minitest-5.4.2 : Unknown
    thread_safe-0.3.4 : no license
    tzinfo-1.2.2 : MIT License
    activesupport-4.1.6 : MIT License
    coderay-1.1.0 : Unknown
    ffi-1.9.10 : BSD 3-clause "New" or "Revised" License
    levenshtein-ffi-1.1.0 : Unknown
    rugged-0.23.0b4 : MIT License
    licensee : MIT License
    method_source-0.8.2 : MIT License
    slop-3.6.0 : MIT License
    pry-0.10.1 : MIT License
    ruby-prof-0.15.1 : BSD 2-clause "Simplified" License
    shoulda-context-1.2.1 : MIT License
    shoulda-matchers-2.7.0: MIT License
    shoulda-3.5.0 : MIT License
    bundler-1.6.9 : MIT License
    Audit all project dependencies

    View Slide

  38. !
    3. Actual license usage 

    across all GitHub repos

    View Slide

  39. !
    10M users, 25M projects

    View Slide

  40. A couple of caveats
    ‣ Only looking at public repos
    ‣ Only looking at non-fork repos
    ‣ Only looking at non-spammy users
    ‣ Excludes some edge cases like Project GITinberg
    ‣ I’m terrible at math (at even worse at MySQL)

    View Slide

  41. choosealicense.com

    View Slide

  42. !
    Have license preferences 

    changed over time?

    View Slide

  43. The power of defaults

    View Slide

  44. View Slide

  45. !
    Does license usage 

    differ by language?

    View Slide

  46. !
    Most licensed languages

    Go (53%), CoffeeScript (49%), C (42%), 

    Objective-C (41%), Scala (41%)

    View Slide

  47. !
    Least licensed languages

    ASP (8%), R (8%), HTML (15%), 

    VB (16%), ColdFusion (19%)

    View Slide

  48. !
    Most used languages
    JavaScript (37%), Java (27%), Ruby (36%), 

    Python (39%), PHP (35%)

    View Slide

  49. !
    Does the owner 

    influence license usage?

    View Slide

  50. !
    License usage by owner
    Organization - 34.84%
    User - 14.94%

    View Slide

  51. View Slide

  52. !
    Does collaboration 

    affect license usage?

    View Slide

  53. !
    Remember the 80/20 rule

    View Slide

  54. View Slide

  55. !
    Forks Pull Requests Stars
    0 27.50% 14.97% 13.71%
    1-100 35.60% 35.83% 25.44%
    101-500 64.26% 63.76% 64.58%
    501-1000 72.11% 73.34% 72.31%
    1000+ 69.43% 77.05% 77.23%
    Percent licensed by use

    View Slide

  56. With 1 PR: 15% -> 36% licensed

    View Slide

  57. With 5 PRs: 36% -> 55% licensed

    View Slide

  58. !
    Does license choice 

    affect contribution?

    View Slide

  59. View Slide

  60. !
    Licensed projects are 10% more likely
    to have at least one pull request

    View Slide

  61. !
    (My) Conclusions

    View Slide

  62. !
    We need to make it easier 

    to license your code

    View Slide

  63. !
    We need fewer licenses

    View Slide

  64. !
    We need cross-platform standards

    View Slide

  65. !
    80% of projects on GitHub are
    unlicensed, and that’s okay

    View Slide

  66. !
    Companies are concerned with
    open source licenses

    View Slide

  67. !
    Emerging developers live in 

    a post-licensing world

    View Slide

  68. !
    Open source languages are more likely
    to produce open source projects

    View Slide

  69. !
    The more valuable a project is, 

    the more likely it is to be licensed

    View Slide

  70. !
    The majority of non-trivial
    projects are licensed

    View Slide

  71. !
    It doesn’t matter what license you choose,
    so long as you choose a license

    View Slide

  72. !
    How can we empower the next generation
    of developers to <3 open source?

    View Slide

  73. !
    Open source licensing by the numbers
    A look at how GitHub users license their code (or in all practicality, don’t)
    @BenBalter
    [email protected]

    View Slide