Open source licensing by the numbers

Open source licensing by the numbers

A look at how GitHub users license their code (or in all practicality, don’t)

19d03ecc1ff5da1a5e63a3ddaa2d84c2?s=128

Ben Balter

July 22, 2015
Tweet

Transcript

  1. ! Open source licensing by the numbers A look at

    how GitHub users license their code (or in all practicality, don’t) @BenBalter opensource@github.com
  2. ! @BenBalter " Gov. Evangelist @ GitHub $ Attorney (but

    IANYL) % Open source developer
  3. Roadmap 1. Why do we care about open source licensing?

    2. How the GitHub licensing API works 3. The numbers
  4. ! 1. Why open source licensing matters

  5. ! Open source ≠ published source

  6. ! With out an open source license, 
 your code

    isn’t open source
  7. ! Open Source (software)
 software that can be freely used,

    modified, and shared (in both modified and unmodified form) by anyone help.github.com/articles/github-glossary/#open-source
  8. ! Open Source
 a philosophy of collaboration in which working

    materials are made available online for anyone to fork, modify, discuss, and contribute to.
  9. ! Licensing moves software 
 from published to open

  10. ! All open source licenses 
 contain three things

  11. ! 1. Copyright license grant 2. Disclaimer against liability 3.

    Attribution requirements
  12. ! 20% of “open source” projects on GitHub have an

    open source license
  13. ! Cue scary music

  14. ! That’s actually awesome

  15. None
  16. ! Pareto Principle - 80% of the effects come from

    20% of the causes
  17. Ruby on Rails, Bootstrap Your CS homework, that weekend hack

    project
  18. ! Why don’t people license their code?

  19. ‣ There’s a bajillion different options ‣ Every discussion results

    in a holy war ‣ Open source licensing isn’t taught in law school ‣ Devs today grew up in a world in which open source has won
  20. ! We can’t leave open source licensing in the hands

    of large corporations
  21. ! You shouldn’t need to hire a lawyer to contribute

    to open source
  22. choosealicense.com

  23. ! 2. GitHub’s licensing API

  24. ! How do you identify a project’s license?

  25. ! What’s necessary to license a project? ‣ LICENSE file

    with the full text of the license ‣ LICENSE file with the license name or abbreviation ‣ README which links to the full license text ‣ README which references the license ‣ Human readable references within a file ‣ Machine-readable package manager config file Halp? github.com/licensee/issues/4
  26. ! github.com/benbalter/licensee

  27. ! Licensee has 4 matching “strategies” 1. Copyright matcher 2.

    Exact matcher 3. Git matcher 4. Levenshtein matcher
  28. license = Licensee.license "/path/to/a/project" => #<Licensee::License name="MIT" match=0.9842154131847726> license.key =>

    "mit" license.name => "MIT License" license.meta["source"] => "http://opensource.org/licenses/MIT" license.meta["description"] => "A permissive license that is short and to the point. It lets people do anything with your code with proper attribution and without warranty." license.meta["permitted"] => ["modifications","distribution","sublicense","private-use"] Ruby
  29. $ licensee ~/projects/licensee License file: LICENSE.md License: MIT License Confidence:

    94% Method: Licensee::GitMatcher Command line
  30. ! Via the GitHub API

  31. [ { "key": "agpl-3.0", "name": "GNU Affero General Public License

    v3.0", "url": "https://api.github.com/licenses/agpl-3.0", "featured": false }, { "key": "apache-2.0", "name": "Apache License 2.0", "url": "https://api.github.com/licenses/apache-2.0", "featured": true }, ... $ curl -H 'Accept: application/vnd.github.drax-preview+json' \ https://api.github.com/licenses
  32. $ curl -H 'Accept: application/vnd.github.drax-preview+json' \ https://api.github.com/licenses/mit { "key": "mit",

    "name": "MIT License", "url": "https://api.github.com/licenses/mit", "featured": true, "html_url": "http://choosealicense.com/licenses/mit/", "description": "A permissive license that is short and to the point. It lets people do anything with your code with proper attribution and without warranty.", "category": "MIT", "implementation": "Create a text file (typically named LICENSE or LICENSE.txt) in the root of your source code and copy the text of the license into the file. Replace [year] with the current year and [fullname] with the name (or names) of the copyright holders.", "required": [ "include-copyright" ], "permitted": [ "commercial-use", "modifications", "distribution",
  33. { "id": 12325212, "name": "gman", "full_name": “benbalter/gman", ... "license": {

    "key": "mit", "name": "MIT License", "url": "https://api.github.com/licenses/mit", "featured": true }, ... "network_count": 38, "subscribers_count": 5 } $ curl -H 'Accept: application/vnd.github.drax-preview+json' \ https://api.github.com/repos/benbalter/gman
  34. ! Audit your org’s 
 open source license usage $

    curl -H 'Accept: application/vnd.github.drax-preview+json' \ https://api.github.com/orgs/github/repos
  35. 5 apache-2.0 1 bsd-3-clause 2 cc0-1.0 1 gpl-2.0 56 mit

    12 other curl -s -H 'Accept: application/vnd.github.drax-preview+json' \ ‘https://api.github.com/orgs/github/repos?per_page=100' | \ grep -A1 '"license"' | grep '"key"' | cut -d'"' -f4 | \ sort | uniq -c h/t @mislav
  36. #!/bin/bash set -e { ruby -rbundler -e 'puts Bundler.load.specs.map(&:gem_dir)' ls

    -d node_modules/* bower_components/* } | while read dir; do echo -n "${dir##*/}: " licensee "$dir" | grep 'License:\|Unknown' | sed 's/License: //' done Audit all project dependencies h/t @mislav
  37. minitest-5.4.2 : Unknown thread_safe-0.3.4 : no license tzinfo-1.2.2 : MIT

    License activesupport-4.1.6 : MIT License coderay-1.1.0 : Unknown ffi-1.9.10 : BSD 3-clause "New" or "Revised" License levenshtein-ffi-1.1.0 : Unknown rugged-0.23.0b4 : MIT License licensee : MIT License method_source-0.8.2 : MIT License slop-3.6.0 : MIT License pry-0.10.1 : MIT License ruby-prof-0.15.1 : BSD 2-clause "Simplified" License shoulda-context-1.2.1 : MIT License shoulda-matchers-2.7.0: MIT License shoulda-3.5.0 : MIT License bundler-1.6.9 : MIT License Audit all project dependencies
  38. ! 3. Actual license usage 
 across all GitHub repos

  39. ! 10M users, 25M projects

  40. A couple of caveats ‣ Only looking at public repos

    ‣ Only looking at non-fork repos ‣ Only looking at non-spammy users ‣ Excludes some edge cases like Project GITinberg ‣ I’m terrible at math (at even worse at MySQL)
  41. choosealicense.com

  42. ! Have license preferences 
 changed over time?

  43. The power of defaults

  44. None
  45. ! Does license usage 
 differ by language?

  46. ! Most licensed languages
 Go (53%), CoffeeScript (49%), C (42%),

    
 Objective-C (41%), Scala (41%)
  47. ! Least licensed languages
 ASP (8%), R (8%), HTML (15%),

    
 VB (16%), ColdFusion (19%)
  48. ! Most used languages JavaScript (37%), Java (27%), Ruby (36%),

    
 Python (39%), PHP (35%)
  49. ! Does the owner 
 influence license usage?

  50. ! License usage by owner Organization - 34.84% User -

    14.94%
  51. None
  52. ! Does collaboration 
 affect license usage?

  53. ! Remember the 80/20 rule

  54. None
  55. ! Forks Pull Requests Stars 0 27.50% 14.97% 13.71% 1-100

    35.60% 35.83% 25.44% 101-500 64.26% 63.76% 64.58% 501-1000 72.11% 73.34% 72.31% 1000+ 69.43% 77.05% 77.23% Percent licensed by use
  56. With 1 PR: 15% -> 36% licensed

  57. With 5 PRs: 36% -> 55% licensed

  58. ! Does license choice 
 affect contribution?

  59. None
  60. ! Licensed projects are 10% more likely to have at

    least one pull request
  61. ! (My) Conclusions

  62. ! We need to make it easier 
 to license

    your code
  63. ! We need fewer licenses

  64. ! We need cross-platform standards

  65. ! 80% of projects on GitHub are unlicensed, and that’s

    okay
  66. ! Companies are concerned with open source licenses

  67. ! Emerging developers live in 
 a post-licensing world

  68. ! Open source languages are more likely to produce open

    source projects
  69. ! The more valuable a project is, 
 the more

    likely it is to be licensed
  70. ! The majority of non-trivial projects are licensed

  71. ! It doesn’t matter what license you choose, so long

    as you choose a license
  72. ! How can we empower the next generation of developers

    to <3 open source?
  73. ! Open source licensing by the numbers A look at

    how GitHub users license their code (or in all practicality, don’t) @BenBalter opensource@github.com