Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Aggressively Probing Ruby Projects

Aggressively Probing Ruby Projects

So we built an employee-driven, geographically-distributed, multi-client, HTML5-based, API-centric, bathroom-enabled, buzzword-embracing music server for the GitHub offices. It's been a fun project to explore company culture, CSS frameworks, JavaScript methodologies, native clients, outside contributions, and to discover who gets really angry when Garth Brooks starts singing on the speakers. It's one of those projects that ended up far more nutty than the original idea. Come steal some ideas for your own projects.

78b475797a14c84799063c7cd073962f?s=128

Zach Holman

April 19, 2012
Tweet

Transcript

  1. PROBING RUBY PROJECTS AGGRESSIVELY

  2. @HOLMAN

  3. None
  4. METHODOLOGY

  5. SO I WROTE CODE THAT DESCRIBES RUBY

  6. INDEX + ANALYZE

  7. INDEX + ANALYZE

  8. 2 PROBE ANALYSIS 1REPOSITORY 3REPORTING

  9. 1REPOSITORY git clone <repo>

  10. 2 PROBE ANALYSIS “Probe” a class that looks for something

    in your project
  11. 3REPORTING run a report on what we discovered about your

    project
  12. REPOSITORY PROBES REPORTS b s n

  13. REPOSITORY PROBES REPORTS b s n Simple, right?

  14. REPOSITORY PROBES REPORTS b s n b b b b

    b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bbbbbbbbbbbbbbbbb 15,698
  15. REPOSITORY PROBES REPORTS b s n b b b b

    b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bbbbbbbbbbbbbbbbb 15,698 s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s 17 FILES, 41 TESTS sssss sssss s
  16. 10 REPO SLICES REPOSITORY PROBES REPORTS b s n b

    b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bbbbbbbbbbbbbbbbb s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s sssss sssss s n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n 15,698 17 FILES, 41 TESTS
  17. REPOSITORY SLICES - Split all SHAs into ten slices -

    Probe history at each slice
  18. DO THE MATH 80,000,000,000,000 metrics (or something)

  19. GOAL: Index projects, see how they change over time, and

    compare them against all indexed projects
  20. CAVEATS

  21. NOT A STATISTICIAN OPEN SOURCE (so if you are one,

    fix it) ...but this is
  22. GitHub-only k API v3 Newer Projects Popular Projects Only Public

    Code
  23. This is a new project. It’s for fun. Take it

    lightly (for now).
  24. OTHERWISE, everything else in this is 100% ACCURATE and won’t

    generate an INTERNET FLAMEWAR maybe probably.
  25. HOPPER

  26. github.com/holman/hopper

  27. None
  28. HOPPER SINATRA REDIS RESQUE HEROKU 1.9 LIBGIT2

  29. Simple UI SINATRA PJAX Minimal Frontend Mustache SCSS + CoffeeScript

  30. REDIS Primarily schemaless Dynamic probes Easy bootstrapping k Redis To

    Go k
  31. Easy deployment HEROKU Cedar stack is rad, yo

  32. Independent workers RESQUE Millions of jobs

  33. AST parsing with Ripper 1.9 Stop hurting Ruby. Use 1.9.

  34. libgit2: linkable Git library LIBGIT2 rugged: a gem for libgit2

    fast, easy Git access
  35. PROBING

  36. WRITING NEW PROBES SHOULD BE TRIVIAL

  37. class Loc < Probe exposes :lines def lines repository.files.map do

    |file| repository.read(file).to_s.lines.count end.sum end end APP/PROBES/#{PROBE}.RB
  38. class Loc < Probe exposes :lines def lines repository.files.map do

    |file| repository.read(file).to_s.lines.count end.sum end end APP/PROBES/#{PROBE}.RB
  39. class Loc < Probe exposes :lines # Report on these

    methods def lines repository.files.map do |file| repository.read(file).to_s.lines.count end.sum end end APP/PROBES/#{PROBE}.RB
  40. class Loc < Probe exposes :lines def lines repository.files.map do

    |file| repository.read(file).to_s.lines.count end.sum end end APP/PROBES/#{PROBE}.RB
  41. class Loc < Probe exposes :lines def lines repository.files.map do

    |file| repository.read(file).to_s.lines.count end.sum end end APP/PROBES/#{PROBE}.RB
  42. class Loc < Probe exposes :lines def lines repository.files.map do

    |file| repository.read(file).to_s.lines.count end.sum end end APP/PROBES/#{PROBE}.RB
  43. class Repository def lines def read def revisions def commit_messages

    end APP/MODELS/REPOSITORY.RB COMMON HELPER METHODS CAN BE ABSTRACTED TO SVN, HG, ETC.
  44. A QUICK ASIDE: RUGGED IS COOL repo = Rugged::Repository.new(path) repo.lookup(‘2cc3e9a’).message

    # => "Run an resque worker\n"
  45. A QUICK ASIDE: RUGGED IS COOL walker = Rugged::Walker.new(repo) walker.push(‘2cc3e9a’)

    walker.map(&:oid) # => [an, array, of, shas]
  46. A QUICK ASIDE: RUGGED IS COOL - Faster than shelling

    out (naturally) - Write or stage new commits - Multi-platform, permissive license, bindings for all major languages Ruby: gem install rugged - Faster than any other Git library
  47. SCIENCE!

  48. [1, 2, 3, 4, 4] REMEMBER YOUR SCHOOLING: MEAN: MEDIAN:

    MODE: 4 3 2.8
  49. PROJECTS 15,698 most-forked Ruby projects on GitHub

  50. SCIENCE #1 MOST PROJECTS ARE LONELY

  51. CONTRIBUTORS 3.77 (MEAN)

  52. CONTRIBUTORS 2 (MEDIAN)

  53. Popular projects get a disproportionate amount of help.

  54. FOLLOWERS 22 (MEDIAN)

  55. FORKS 5 (MEDIAN) five forks for two contributors means three

    inactive or ignored forks
  56. Again, this doesn’t take into account the bottom 90%, either.

  57. Open source is a long, long, lonely tail.

  58. SCIENCE #2 OFFENSIVE RUBY CODE IS OFFENSIVE

  59. SWEAR WORDS 0.50 PER PROJECT

  60. DEFINE_METHOD()S 4.27 PER PROJECT PERHAPS MORE OFFENSIVELY, (and 30.1 SEND()s,

    but that’s harder to measure)
  61. QUESTION: \t or SPACES ?

  62. ANSWER: YOU ARE A HORRIBLE PERSON IF YOU HARD TAB

  63. ANSWER: LUCKILY ONLY 8.4% OF PROJECTS PREDOMINANTLY HARD TAB

  64. median trailing spaces: mean trailing spaces: 531.5 31 IT IS

    A HORROR
  65. 98.7% of the top Rails projects AVOID SEMICOLONS ...in their

    JavaScript;
  66. just kidding omg stop talking about semicolons they’re ;;;;;;;ing boring

  67. SCIENCE #3 THE WORK

  68. TOTAL LINES 17,316 LINES OF RUBY CODE 4,572 761 COMMENT

    LINES (mean)
  69. TOTAL LINES 1,132 LINES OF RUBY CODE 563 63 COMMENT

    LINES (median)
  70. FROM THIS, WE CAN SEE: Popular projects tend to have

    more non-Ruby code Inline documentation is sparse (11%)
  71. BRANCHES . 1.96 remote branches per project median: 1

  72. TOTAL COMMITS 417.0 MEAN 110.0 MEDIAN

  73. SCIENCE #4 THE RUBY ECOSYSTEM

  74. RAKE 78% of projects have a Rakefile

  75. BUNDLER 31% of projects have a Gemfile

  76. BUNDLER 15% of projects have a Gemfile.lock

  77. GEMS 51% of projects had a .gemspec

  78. CONTAINERS CLASSES MEAN MEDIAN MODULES 55 44 8 6 *(includes

    redefinitions)
  79. METHODS CLASS MEAN MEDIAN INSTANCE 20 266 2 28

  80. SCIENCE #5 THE PAPERWORK

  81. LICENSES 47.1% of projects don’t have a license This is

    worrisome
  82. LICENSES Across all projects, 44% chose MIT as their license

    2.2% Apache 1.1% GPL 0.5% LGPL
  83. THE FUTURE

  84. ARBITRARY COMPARISONS

  85. MULTIPLE LANGUAGES

  86. LANGUAGE COMPARISONS

  87. MORE D3.JS VISUALIZATIONS

  88. GITHUB.COM/HOLMAN/HOPPER CODESTAT.US

  89. @HOLMAN ZACH HOLMAN ZACHHOLMAN.COM/TALKS