Slide 1

Slide 1 text

PROBING RUBY PROJECTS AGGRESSIVELY

Slide 2

Slide 2 text

@HOLMAN

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

METHODOLOGY

Slide 5

Slide 5 text

SO I WROTE CODE THAT DESCRIBES RUBY

Slide 6

Slide 6 text

INDEX + ANALYZE

Slide 7

Slide 7 text

INDEX + ANALYZE

Slide 8

Slide 8 text

2 PROBE ANALYSIS 1REPOSITORY 3REPORTING

Slide 9

Slide 9 text

1REPOSITORY git clone

Slide 10

Slide 10 text

2 PROBE ANALYSIS “Probe” a class that looks for something in your project

Slide 11

Slide 11 text

3REPORTING run a report on what we discovered about your project

Slide 12

Slide 12 text

REPOSITORY PROBES REPORTS b s n

Slide 13

Slide 13 text

REPOSITORY PROBES REPORTS b s n Simple, right?

Slide 14

Slide 14 text

REPOSITORY PROBES REPORTS b s n b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bbbbbbbbbbbbbbbbb 15,698

Slide 15

Slide 15 text

REPOSITORY PROBES REPORTS b s n b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bbbbbbbbbbbbbbbbb 15,698 s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s 17 FILES, 41 TESTS sssss sssss s

Slide 16

Slide 16 text

10 REPO SLICES REPOSITORY PROBES REPORTS b s n b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bbbbbbbbbbbbbbbbb s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s sssss sssss s n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n 15,698 17 FILES, 41 TESTS

Slide 17

Slide 17 text

REPOSITORY SLICES - Split all SHAs into ten slices - Probe history at each slice

Slide 18

Slide 18 text

DO THE MATH 80,000,000,000,000 metrics (or something)

Slide 19

Slide 19 text

GOAL: Index projects, see how they change over time, and compare them against all indexed projects

Slide 20

Slide 20 text

CAVEATS

Slide 21

Slide 21 text

NOT A STATISTICIAN OPEN SOURCE (so if you are one, fix it) ...but this is

Slide 22

Slide 22 text

GitHub-only k API v3 Newer Projects Popular Projects Only Public Code

Slide 23

Slide 23 text

This is a new project. It’s for fun. Take it lightly (for now).

Slide 24

Slide 24 text

OTHERWISE, everything else in this is 100% ACCURATE and won’t generate an INTERNET FLAMEWAR maybe probably.

Slide 25

Slide 25 text

HOPPER

Slide 26

Slide 26 text

github.com/holman/hopper

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

HOPPER SINATRA REDIS RESQUE HEROKU 1.9 LIBGIT2

Slide 29

Slide 29 text

Simple UI SINATRA PJAX Minimal Frontend Mustache SCSS + CoffeeScript

Slide 30

Slide 30 text

REDIS Primarily schemaless Dynamic probes Easy bootstrapping k Redis To Go k

Slide 31

Slide 31 text

Easy deployment HEROKU Cedar stack is rad, yo

Slide 32

Slide 32 text

Independent workers RESQUE Millions of jobs

Slide 33

Slide 33 text

AST parsing with Ripper 1.9 Stop hurting Ruby. Use 1.9.

Slide 34

Slide 34 text

libgit2: linkable Git library LIBGIT2 rugged: a gem for libgit2 fast, easy Git access

Slide 35

Slide 35 text

PROBING

Slide 36

Slide 36 text

WRITING NEW PROBES SHOULD BE TRIVIAL

Slide 37

Slide 37 text

class Loc < Probe exposes :lines def lines repository.files.map do |file| repository.read(file).to_s.lines.count end.sum end end APP/PROBES/#{PROBE}.RB

Slide 38

Slide 38 text

class Loc < Probe exposes :lines def lines repository.files.map do |file| repository.read(file).to_s.lines.count end.sum end end APP/PROBES/#{PROBE}.RB

Slide 39

Slide 39 text

class Loc < Probe exposes :lines # Report on these methods def lines repository.files.map do |file| repository.read(file).to_s.lines.count end.sum end end APP/PROBES/#{PROBE}.RB

Slide 40

Slide 40 text

class Loc < Probe exposes :lines def lines repository.files.map do |file| repository.read(file).to_s.lines.count end.sum end end APP/PROBES/#{PROBE}.RB

Slide 41

Slide 41 text

class Loc < Probe exposes :lines def lines repository.files.map do |file| repository.read(file).to_s.lines.count end.sum end end APP/PROBES/#{PROBE}.RB

Slide 42

Slide 42 text

class Loc < Probe exposes :lines def lines repository.files.map do |file| repository.read(file).to_s.lines.count end.sum end end APP/PROBES/#{PROBE}.RB

Slide 43

Slide 43 text

class Repository def lines def read def revisions def commit_messages end APP/MODELS/REPOSITORY.RB COMMON HELPER METHODS CAN BE ABSTRACTED TO SVN, HG, ETC.

Slide 44

Slide 44 text

A QUICK ASIDE: RUGGED IS COOL repo = Rugged::Repository.new(path) repo.lookup(‘2cc3e9a’).message # => "Run an resque worker\n"

Slide 45

Slide 45 text

A QUICK ASIDE: RUGGED IS COOL walker = Rugged::Walker.new(repo) walker.push(‘2cc3e9a’) walker.map(&:oid) # => [an, array, of, shas]

Slide 46

Slide 46 text

A QUICK ASIDE: RUGGED IS COOL - Faster than shelling out (naturally) - Write or stage new commits - Multi-platform, permissive license, bindings for all major languages Ruby: gem install rugged - Faster than any other Git library

Slide 47

Slide 47 text

SCIENCE!

Slide 48

Slide 48 text

[1, 2, 3, 4, 4] REMEMBER YOUR SCHOOLING: MEAN: MEDIAN: MODE: 4 3 2.8

Slide 49

Slide 49 text

PROJECTS 15,698 most-forked Ruby projects on GitHub

Slide 50

Slide 50 text

SCIENCE #1 MOST PROJECTS ARE LONELY

Slide 51

Slide 51 text

CONTRIBUTORS 3.77 (MEAN)

Slide 52

Slide 52 text

CONTRIBUTORS 2 (MEDIAN)

Slide 53

Slide 53 text

Popular projects get a disproportionate amount of help.

Slide 54

Slide 54 text

FOLLOWERS 22 (MEDIAN)

Slide 55

Slide 55 text

FORKS 5 (MEDIAN) five forks for two contributors means three inactive or ignored forks

Slide 56

Slide 56 text

Again, this doesn’t take into account the bottom 90%, either.

Slide 57

Slide 57 text

Open source is a long, long, lonely tail.

Slide 58

Slide 58 text

SCIENCE #2 OFFENSIVE RUBY CODE IS OFFENSIVE

Slide 59

Slide 59 text

SWEAR WORDS 0.50 PER PROJECT

Slide 60

Slide 60 text

DEFINE_METHOD()S 4.27 PER PROJECT PERHAPS MORE OFFENSIVELY, (and 30.1 SEND()s, but that’s harder to measure)

Slide 61

Slide 61 text

QUESTION: \t or SPACES ?

Slide 62

Slide 62 text

ANSWER: YOU ARE A HORRIBLE PERSON IF YOU HARD TAB

Slide 63

Slide 63 text

ANSWER: LUCKILY ONLY 8.4% OF PROJECTS PREDOMINANTLY HARD TAB

Slide 64

Slide 64 text

median trailing spaces: mean trailing spaces: 531.5 31 IT IS A HORROR

Slide 65

Slide 65 text

98.7% of the top Rails projects AVOID SEMICOLONS ...in their JavaScript;

Slide 66

Slide 66 text

just kidding omg stop talking about semicolons they’re ;;;;;;;ing boring

Slide 67

Slide 67 text

SCIENCE #3 THE WORK

Slide 68

Slide 68 text

TOTAL LINES 17,316 LINES OF RUBY CODE 4,572 761 COMMENT LINES (mean)

Slide 69

Slide 69 text

TOTAL LINES 1,132 LINES OF RUBY CODE 563 63 COMMENT LINES (median)

Slide 70

Slide 70 text

FROM THIS, WE CAN SEE: Popular projects tend to have more non-Ruby code Inline documentation is sparse (11%)

Slide 71

Slide 71 text

BRANCHES . 1.96 remote branches per project median: 1

Slide 72

Slide 72 text

TOTAL COMMITS 417.0 MEAN 110.0 MEDIAN

Slide 73

Slide 73 text

SCIENCE #4 THE RUBY ECOSYSTEM

Slide 74

Slide 74 text

RAKE 78% of projects have a Rakefile

Slide 75

Slide 75 text

BUNDLER 31% of projects have a Gemfile

Slide 76

Slide 76 text

BUNDLER 15% of projects have a Gemfile.lock

Slide 77

Slide 77 text

GEMS 51% of projects had a .gemspec

Slide 78

Slide 78 text

CONTAINERS CLASSES MEAN MEDIAN MODULES 55 44 8 6 *(includes redefinitions)

Slide 79

Slide 79 text

METHODS CLASS MEAN MEDIAN INSTANCE 20 266 2 28

Slide 80

Slide 80 text

SCIENCE #5 THE PAPERWORK

Slide 81

Slide 81 text

LICENSES 47.1% of projects don’t have a license This is worrisome

Slide 82

Slide 82 text

LICENSES Across all projects, 44% chose MIT as their license 2.2% Apache 1.1% GPL 0.5% LGPL

Slide 83

Slide 83 text

THE FUTURE

Slide 84

Slide 84 text

ARBITRARY COMPARISONS

Slide 85

Slide 85 text

MULTIPLE LANGUAGES

Slide 86

Slide 86 text

LANGUAGE COMPARISONS

Slide 87

Slide 87 text

MORE D3.JS VISUALIZATIONS

Slide 88

Slide 88 text

GITHUB.COM/HOLMAN/HOPPER CODESTAT.US

Slide 89

Slide 89 text

@HOLMAN ZACH HOLMAN ZACHHOLMAN.COM/TALKS