Slide 1

Slide 1 text

! Open source licensing by the numbers A look at how GitHub users license their code (or in all practicality, don’t) @BenBalter [email protected]

Slide 2

Slide 2 text

! @BenBalter " Gov. Evangelist @ GitHub $ Attorney (but IANYL) % Open source developer

Slide 3

Slide 3 text

Roadmap 1. Why do we care about open source licensing? 2. How the GitHub licensing API works 3. The numbers

Slide 4

Slide 4 text

! 1. Why open source licensing matters

Slide 5

Slide 5 text

! Open source ≠ published source

Slide 6

Slide 6 text

! With out an open source license, 
 your code isn’t open source

Slide 7

Slide 7 text

! Open Source (software)
 software that can be freely used, modified, and shared (in both modified and unmodified form) by anyone help.github.com/articles/github-glossary/#open-source

Slide 8

Slide 8 text

! Open Source
 a philosophy of collaboration in which working materials are made available online for anyone to fork, modify, discuss, and contribute to.

Slide 9

Slide 9 text

! Licensing moves software 
 from published to open

Slide 10

Slide 10 text

! All open source licenses 
 contain three things

Slide 11

Slide 11 text

! 1. Copyright license grant 2. Disclaimer against liability 3. Attribution requirements

Slide 12

Slide 12 text

! 20% of “open source” projects on GitHub have an open source license

Slide 13

Slide 13 text

! Cue scary music

Slide 14

Slide 14 text

! That’s actually awesome

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

! Pareto Principle - 80% of the effects come from 20% of the causes

Slide 17

Slide 17 text

Ruby on Rails, Bootstrap Your CS homework, that weekend hack project

Slide 18

Slide 18 text

! Why don’t people license their code?

Slide 19

Slide 19 text

‣ There’s a bajillion different options ‣ Every discussion results in a holy war ‣ Open source licensing isn’t taught in law school ‣ Devs today grew up in a world in which open source has won

Slide 20

Slide 20 text

! We can’t leave open source licensing in the hands of large corporations

Slide 21

Slide 21 text

! You shouldn’t need to hire a lawyer to contribute to open source

Slide 22

Slide 22 text

choosealicense.com

Slide 23

Slide 23 text

! 2. GitHub’s licensing API

Slide 24

Slide 24 text

! How do you identify a project’s license?

Slide 25

Slide 25 text

! What’s necessary to license a project? ‣ LICENSE file with the full text of the license ‣ LICENSE file with the license name or abbreviation ‣ README which links to the full license text ‣ README which references the license ‣ Human readable references within a file ‣ Machine-readable package manager config file Halp? github.com/licensee/issues/4

Slide 26

Slide 26 text

! github.com/benbalter/licensee

Slide 27

Slide 27 text

! Licensee has 4 matching “strategies” 1. Copyright matcher 2. Exact matcher 3. Git matcher 4. Levenshtein matcher

Slide 28

Slide 28 text

license = Licensee.license "/path/to/a/project" => # license.key => "mit" license.name => "MIT License" license.meta["source"] => "http://opensource.org/licenses/MIT" license.meta["description"] => "A permissive license that is short and to the point. It lets people do anything with your code with proper attribution and without warranty." license.meta["permitted"] => ["modifications","distribution","sublicense","private-use"] Ruby

Slide 29

Slide 29 text

$ licensee ~/projects/licensee License file: LICENSE.md License: MIT License Confidence: 94% Method: Licensee::GitMatcher Command line

Slide 30

Slide 30 text

! Via the GitHub API

Slide 31

Slide 31 text

[ { "key": "agpl-3.0", "name": "GNU Affero General Public License v3.0", "url": "https://api.github.com/licenses/agpl-3.0", "featured": false }, { "key": "apache-2.0", "name": "Apache License 2.0", "url": "https://api.github.com/licenses/apache-2.0", "featured": true }, ... $ curl -H 'Accept: application/vnd.github.drax-preview+json' \ https://api.github.com/licenses

Slide 32

Slide 32 text

$ curl -H 'Accept: application/vnd.github.drax-preview+json' \ https://api.github.com/licenses/mit { "key": "mit", "name": "MIT License", "url": "https://api.github.com/licenses/mit", "featured": true, "html_url": "http://choosealicense.com/licenses/mit/", "description": "A permissive license that is short and to the point. It lets people do anything with your code with proper attribution and without warranty.", "category": "MIT", "implementation": "Create a text file (typically named LICENSE or LICENSE.txt) in the root of your source code and copy the text of the license into the file. Replace [year] with the current year and [fullname] with the name (or names) of the copyright holders.", "required": [ "include-copyright" ], "permitted": [ "commercial-use", "modifications", "distribution",

Slide 33

Slide 33 text

{ "id": 12325212, "name": "gman", "full_name": “benbalter/gman", ... "license": { "key": "mit", "name": "MIT License", "url": "https://api.github.com/licenses/mit", "featured": true }, ... "network_count": 38, "subscribers_count": 5 } $ curl -H 'Accept: application/vnd.github.drax-preview+json' \ https://api.github.com/repos/benbalter/gman

Slide 34

Slide 34 text

! Audit your org’s 
 open source license usage $ curl -H 'Accept: application/vnd.github.drax-preview+json' \ https://api.github.com/orgs/github/repos

Slide 35

Slide 35 text

5 apache-2.0 1 bsd-3-clause 2 cc0-1.0 1 gpl-2.0 56 mit 12 other curl -s -H 'Accept: application/vnd.github.drax-preview+json' \ ‘https://api.github.com/orgs/github/repos?per_page=100' | \ grep -A1 '"license"' | grep '"key"' | cut -d'"' -f4 | \ sort | uniq -c h/t @mislav

Slide 36

Slide 36 text

#!/bin/bash set -e { ruby -rbundler -e 'puts Bundler.load.specs.map(&:gem_dir)' ls -d node_modules/* bower_components/* } | while read dir; do echo -n "${dir##*/}: " licensee "$dir" | grep 'License:\|Unknown' | sed 's/License: //' done Audit all project dependencies h/t @mislav

Slide 37

Slide 37 text

minitest-5.4.2 : Unknown thread_safe-0.3.4 : no license tzinfo-1.2.2 : MIT License activesupport-4.1.6 : MIT License coderay-1.1.0 : Unknown ffi-1.9.10 : BSD 3-clause "New" or "Revised" License levenshtein-ffi-1.1.0 : Unknown rugged-0.23.0b4 : MIT License licensee : MIT License method_source-0.8.2 : MIT License slop-3.6.0 : MIT License pry-0.10.1 : MIT License ruby-prof-0.15.1 : BSD 2-clause "Simplified" License shoulda-context-1.2.1 : MIT License shoulda-matchers-2.7.0: MIT License shoulda-3.5.0 : MIT License bundler-1.6.9 : MIT License Audit all project dependencies

Slide 38

Slide 38 text

! 3. Actual license usage 
 across all GitHub repos

Slide 39

Slide 39 text

! 10M users, 25M projects

Slide 40

Slide 40 text

A couple of caveats ‣ Only looking at public repos ‣ Only looking at non-fork repos ‣ Only looking at non-spammy users ‣ Excludes some edge cases like Project GITinberg ‣ I’m terrible at math (at even worse at MySQL)

Slide 41

Slide 41 text

choosealicense.com

Slide 42

Slide 42 text

! Have license preferences 
 changed over time?

Slide 43

Slide 43 text

The power of defaults

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

! Does license usage 
 differ by language?

Slide 46

Slide 46 text

! Most licensed languages
 Go (53%), CoffeeScript (49%), C (42%), 
 Objective-C (41%), Scala (41%)

Slide 47

Slide 47 text

! Least licensed languages
 ASP (8%), R (8%), HTML (15%), 
 VB (16%), ColdFusion (19%)

Slide 48

Slide 48 text

! Most used languages JavaScript (37%), Java (27%), Ruby (36%), 
 Python (39%), PHP (35%)

Slide 49

Slide 49 text

! Does the owner 
 influence license usage?

Slide 50

Slide 50 text

! License usage by owner Organization - 34.84% User - 14.94%

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

! Does collaboration 
 affect license usage?

Slide 53

Slide 53 text

! Remember the 80/20 rule

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

! Forks Pull Requests Stars 0 27.50% 14.97% 13.71% 1-100 35.60% 35.83% 25.44% 101-500 64.26% 63.76% 64.58% 501-1000 72.11% 73.34% 72.31% 1000+ 69.43% 77.05% 77.23% Percent licensed by use

Slide 56

Slide 56 text

With 1 PR: 15% -> 36% licensed

Slide 57

Slide 57 text

With 5 PRs: 36% -> 55% licensed

Slide 58

Slide 58 text

! Does license choice 
 affect contribution?

Slide 59

Slide 59 text

No content

Slide 60

Slide 60 text

! Licensed projects are 10% more likely to have at least one pull request

Slide 61

Slide 61 text

! (My) Conclusions

Slide 62

Slide 62 text

! We need to make it easier 
 to license your code

Slide 63

Slide 63 text

! We need fewer licenses

Slide 64

Slide 64 text

! We need cross-platform standards

Slide 65

Slide 65 text

! 80% of projects on GitHub are unlicensed, and that’s okay

Slide 66

Slide 66 text

! Companies are concerned with open source licenses

Slide 67

Slide 67 text

! Emerging developers live in 
 a post-licensing world

Slide 68

Slide 68 text

! Open source languages are more likely to produce open source projects

Slide 69

Slide 69 text

! The more valuable a project is, 
 the more likely it is to be licensed

Slide 70

Slide 70 text

! The majority of non-trivial projects are licensed

Slide 71

Slide 71 text

! It doesn’t matter what license you choose, so long as you choose a license

Slide 72

Slide 72 text

! How can we empower the next generation of developers to <3 open source?

Slide 73

Slide 73 text

! Open source licensing by the numbers A look at how GitHub users license their code (or in all practicality, don’t) @BenBalter [email protected]