$30 off During Our Annual Pro Sale. View Details »

GitHub Flavored Ruby

GitHub Flavored Ruby

Someone once told me that software development is a constant battle against complexity. Over the past three years we've built several large systems at GitHub and if anything, that saying is an understatement. Things like tight coupling, insufficient testing or documentation, lack of versioning discipline, and underspecified design documents can easily lead you down a path of ruin. In this talk I'll cover several of the techniques we use at GitHub to defend against complexity in our Ruby systems, including modularization, Readme Driven Development, Rakegem, TomDoc, Semantic Versioning.

Tom Preston-Werner

November 12, 2011
Tweet

More Decks by Tom Preston-Werner

Other Decks in Programming

Transcript

  1. GITHUB
    FLAVORED
    RUBY
    Tom Preston-Werner
    Hello, my name is Tom Preston-Werner

    View Slide

  2. @mojombo
    you should follow me
    and read my blog
    tom.preston-werner.com
    You can find me on Twitter and GitHub as mojombo,
    and read my blog at http://tom.preston-werner.com.

    View Slide

  3. I’m a cofounder and CTO of GitHub.

    View Slide

  4. RakeGem
    Readme Driven Development
    TomDoc
    Semantic Versioning
    Relentless Modularization
    Today I’m going to talk about five ideas that we use at GitHub to streamline how we approach
    building Rubygems.

    View Slide

  5. RELENTLESS
    MODULARIZATION
    --2 MINUTES--

    View Slide

  6. 0
    5,000
    10,000
    15,000
    20,000
    Lines of code
    Tinything Bigshit
    Think of a small project you’ve done. Maybe it has 1000 lines of code. It’s a pleasure to work
    with. Easy to maintain. You love working on it.
    Now think of a huge monolithic project you’ve been part of. I’m betting you’d give anything
    to stay away from that code.

    View Slide

  7. FUUUUUUUUUUUUUUUU
    How the hell does this even happen!?
    Large code has a tendency to become messy and tightly coupled. Sometimes you don’t even
    realize this is happening. Without a weapon to fight this trend, you’ll end up spending your
    days untangling slinkies instead of clapping like an idiot while they slink down the stairs.

    View Slide

  8. How do I decide what
    to modularize?”

    Sometimes it can be tricky to decide what to modularize and when something should be
    extracted. There’s a simple heuristic I like to use. Modularize...

    View Slide

  9. EVERYTHINGGGGGGGGG
    EVERYTHINGGGGGGGGGGGG. If you find yourself wondering if you should modularize
    something or not, just remember this baby staring into your soul and you’ll do the right
    thing.

    View Slide

  10. github.com
    grit
    smoke
    chimney
    bertrpc
    proxymachine
    ernie
    failbot
    gerve
    resque
    RockQueue
    jekyll
    nodeload
    albino
    markup
    camo
    gollum
    heaven
    stratocaster
    amen
    haystack hubot
    services help.github.com
    jobs
    At GitHub we embrace modularization in a big way. We continually extract pieces of the main
    GitHub.com Rails app into their own components. Then we give them funny names.

    View Slide

  11. A neat trick that I use to approach modularity is by remembering my good childhood friend
    Mr. Rogers. He liked to make believe, and so do I.

    View Slide

  12. Make Believe
    Open Source
    Libraries
    I make believe that whatever I’m working on is going to be open sourced. This forces me to
    use proper abstractions and prevents me from coupling the code too tightly with the main
    app.

    View Slide

  13. Make Believe
    Open Source
    Libraries
    Even better is if you really DO open source your libraries and components. We’re a huge fan
    of this at GitHub. We try to open source anything that does not represent core business value.

    View Slide

  14. MODULARIZE TO
    PREVENT PAIN
    KEY CONCEPT
    Small projects are easy and enjoyable to write and maintain. Big projects are hard and suck to
    maintain. Save yourself some pain and modularize like you mean it!

    View Slide

  15. README DRIVEN
    DEVELOPMENT
    --6 MINUTES--

    View Slide

  16. [waterfall]
    In 1970 Winston Royce wrote a book about project management. In it he outlined a
    methodology called Waterfall Design. Even though he wrote about this system as an example
    of what NOT to do, enterprises and government ignored that and started using it anyway. =/
    Over specifying requirements is a disaster. We’ve all embraced Agile techniques to escape its
    tyranny.

    View Slide

  17. [cowboy]
    I don’t follow
    anyone’s rules...
    ...not even
    my own
    But in retaliation of Waterfall, we’re tempted to go too far in the other direction and become
    cowboy coders. This is just as bad.

    View Slide

  18. A perfect implementation
    of the wrong specification
    is useless.
    Either way you can end up with the wrong specification. A perfect implementation of the
    wrong specification is useless.

    View Slide

  19. IS THERE A MIDDLE GROUND?
    there must be a middle ground, right?
    Surely there must be some solution that lies between these two extremes. Something that’s
    not OVER specified or UNDER specified.

    View Slide

  20. WRITE YOUR README
    FIRST
    There’s already a document that we write that contains the information we need to
    understand a project and how it works. It’s called the README. What if we wrote our
    READMEs first? We could think through the problem domain enough to prevent big mistakes,
    but still leave ourselves with enough flexibility to end up with a correct implementation.

    View Slide

  21. Readme.md
    Spec.md
    When I first started doing this, it was amazing. But it can be confusing if you have an empty
    repository with just a README file and no implementation. I’ve solved this problem by
    renaming README to SPEC during the initial phase. Then I move parts of the SPEC into the
    README as I implement features, thereby keeping the code and the docs in sync.

    View Slide

  22. google://readme
    driven development
    I’ve written a blog post that further explains this idea. It’s on my weblog. Just search for
    “readme driven development” and it’ll be the first result.

    View Slide

  23. USE RDD TO SPECIFY THE
    RIGHT PRODUCT
    KEY CONCEPT
    RDD can help you build better software by writing down your thoughts before you start
    coding, and prevents you from locking in the wrong specification by writing too much.

    View Slide

  24. RAKEGEM
    --12 MINUTES--

    View Slide

  25. Rakegem is a Minecraft plugin I created that totally makes it easy to harvest
    Rubies from a standard grass block. It’s really great when...
    Naw, I’m just kidding.

    View Slide

  26. RAKE-BASED GEM BUILDER
    and deployer, doccer, tester, and manifester
    Rakegem is a flexible, customizable Rake-based gem builder, and more.

    View Slide

  27. github.com/
    mojombo/
    rakegem
    If you want to follow along, load up this URL.
    You’ll see just how simple it really is.

    View Slide

  28. NO DEPENDENCIES
    like, for real. no gems involved*.
    * except yours, duh
    Rakegem has NO DEPENDENCIES whatsoever.

    View Slide

  29. HAND-ROLLED GEMSPEC
    +
    SIMPLE RAKE TASKS
    Rubygems already have a great system for specifying everything about how the gem works.
    It’s called the gemspec. Rakegem gives you a template gemspec that’s easy to fill out and
    doesn’t involve any magic. It combines that with a simple Rakefile that handles all the build
    and release dynamics for you.

    View Slide

  30. GEMSPEC
    Here’s what the gemspec template looks like. It provides a lot of guidance about how to write
    your gemspec so you don’t have to dig through mountains of documentation.

    View Slide

  31. RAKEFILE
    The Rakefile can be copied directly to your project without modification. Everything it needs it
    can get from the gemspec.

    View Slide

  32. $ rake -T
    rake build # Build scoped-0.1.0.gem into the pkg directory
    rake clobber_rdoc # Remove rdoc products
    rake console # Open an irb session preloaded with this library
    rake coverage # Generate RCov test coverage and open in your browser
    rake gemspec # Generate scoped.gemspec
    rake rdoc # Build the rdoc HTML Files
    rake release # Create tag v0.1.0 and build and push scoped-0.1.0.gem to Rubygems
    rake rerdoc # Force a rebuild of the RDOC files
    rake test # Run tests
    rake validate # Validate scoped.gemspec
    It adds Rake tasks for all your normal needs: building the gem and docs, running tests, and
    doing releases.

    View Slide

  33. RAKEGEM —
    CUSTOMIZATION
    The beauty of this system is that it’s infinitely customizable. Since the entire system is
    embedded in your project as simple code, you can change anything you want to get the
    perfect workflow.

    View Slide

  34. RAKEFILE
    Here’s what the release task looks like. I like to use a version number that looks like “vX.Y.Z”,
    but maybe you don’t. To change how Rakegem works, just change that line of code!

    View Slide

  35. STOP FIGHTING YOUR GEM
    BUILDING SYSTEM
    KEY CONCEPT
    Your gem management system should be simple and customizable. Rakegem gives you the
    ultimate power and freedom to get things done without any hassle.

    View Slide

  36. TOMDOC
    --16 MINUTES--

    View Slide

  37. FOUR LEVELS
    of documentation
    Line
    Code
    API
    Book
    I’ve identified four levels of code documentation. Line-level docs explain tricky lines of code
    within methods. Code-level docs describe how methods or classes work. API-level docs are
    for end users of your library. Book-level docs provide a long format overview suitable to
    beginners.

    View Slide

  38. FOUR LEVELS
    of documentation
    Code
    TomDoc is my solution to Code-level docs.

    View Slide

  39. tomdoc.org
    If you’d like to follow along, you can find the specification at this URL.

    View Slide

  40. WHY DOCUMENT CODE?
    what does it do?
    is it considered public?
    what params are expected?
    what types are the params?
    what are valid options?
    how do I use the damn thing?
    what type is the return?
    There are a lot of things we ask ourselves when looking at new code. Ruby is especially
    difficult to unravel because of its flexibility. If we don’t write down what we’re thinking when
    we write code, that information is easily lost to the ghosts of time.

    View Slide

  41. PAST TOM AND FUTURE TOM
    I’d like to introduce you to Past Tom. He’s been looking out for me for a long time. Four
    years ago he was writing TomDoc that I still read today. Everytime I’m coding now, I think
    about Future Tom. If I write good docs, I know he’ll look back at me from the future and give
    me two big thumbs up, because I’ve saved him a ton of time and stress.

    View Slide

  42. class Gollum
    class Wiki
    #
    #
    #
    def exist?
    # ...
    end
    end
    end
    Here’s some code. If all we have is the method signature, it’s hard to tell what’s going on.
    Even something simple like what type it returns requires reading the code.

    View Slide

  43. class Gollum
    class Wiki
    # Public: Check whether the wiki's git repo exists on the filesystem.
    #
    # Returns true if the repo exists, and false if it does not.
    def exist?
    # ...
    end
    end
    end what does it do?
    is it considered public?
    what type is the return?
    With just a few shorts words, we can solve a lot of problems and make sure that future
    developers that work with this code don’t change it in unpredictable ways.

    View Slide

  44. class Gollum
    class Wiki
    #
    #
    #
    #
    #
    #
    #
    #
    #
    #
    #
    #
    #
    #
    #
    #
    #
    #
    def write_page(name, format, data, commit = {})
    # ...
    end
    end
    end
    Maybe you think that’s too trivial and reading the code would be fine. Ok, how about this
    example. Not so simple now, is it? We can get some idea of what the method does, and even
    though the argument names are good, there is no visibility into specifics about either. As
    coders, we rely on specifics to write good code.

    View Slide

  45. class Gollum
    class Wiki
    # Public: Write a new version of a page to the Gollum repo root.
    #
    # name - The String name of the page.
    # format - The Symbol format of the page.
    # data - The new String contents of the page.
    # commit - The commit Hash details:
    # :message - The String commit message.
    # :name - The String author full name.
    # :email - The String email address.
    # :parent - Optional Grit::Commit parent to this update.
    # :tree - Optional String SHA1 of the tree to create the
    # index from.
    # :committer - Optional Gollum::Committer instance. If provided,
    # assume that this operation is part of a batch of
    # updates and the commit happens later.
    #
    # Returns the String SHA1 of the newly written version, or the
    # Gollum::Committer instance if this is part of a batch update.
    def write_page(name, format, data, commit = {})
    # ...
    end
    end
    end
    what params are expected?
    what types are the params?
    what are valid options?
    With a little bit of extra work we can illuminate what this method does and make it obvious
    how to use it without having to dig through long method chains and a ton of code.

    View Slide

  46. class Gollum
    class Page
    #
    #
    #
    #
    #
    #
    #
    #
    #
    #
    def self.cname(name)
    # ...
    end
    end
    end
    One last example. Here’s a simple method. The name was obvious to me when I wrote it, but
    two years later, it’s a different story.

    View Slide

  47. class Gollum
    class Page
    # Convert a human page name into a canonical page name.
    #
    # name - The String human page name.
    #
    # Examples
    #
    # Page.cname("Bilbo Baggins")
    # # => 'Bilbo-Baggins'
    #
    # Returns the String canonical name.
    def self.cname(name)
    # ...
    end
    end
    end
    how do I use the damn thing?
    With just a few short lines of TomDoc, I’ve ensured that every developer that sees this code
    for the rest of time will understand and be able to use this method in the proper fashion.
    That’s a pretty big benefit for a few minutes of effort!

    View Slide

  48. The TomDoc specification is designed to be as simple as possible. You should be able to read
    the spec once and know how to write TomDoc without referring back to it very often. Code
    docs should be optimized for humans. We are the ones reading and writing it.

    View Slide

  49. ONE MORE THING
    Oooh.

    View Slide

  50. This is Eric Hodel. He likes hats. He also likes TomDoc, and he just happens to be the
    maintainer of RDoc.

    View Slide

  51. RDOC 3.10 WILL SUPPORT
    TOMDOC
    He’s added TomDoc support to the latest versions of RDoc and if you install 3.10 or later,
    you can convert your TomDoc’d code to nice HTML output without any extra tools!

    View Slide

  52. rdoc --format=tomdoc
    Here’s the magic incantation.

    View Slide

  53. And here’s what it looks like.

    View Slide

  54. Kickass.

    View Slide

  55. CODE DOCUMENTATION IS
    FOR HUMANS
    KEY CONCEPT
    Stop optimizing your docs for machines, and start writing them for Future You. TomDoc is
    easy to write, easy to read, and saves everyone a boatload of time.

    View Slide

  56. SEMANTIC
    VERSIONING
    --23 MINUTES--

    View Slide

  57. DEPENDENCY HELL
    Version Lock
    Version Promiscuity
    There’s a dread place in software development called dependency hell. It’s where you end up
    when you have version requirements that are either overly specific or so broad that
    incompatible versions can sneak in and screw up your system.

    View Slide

  58. semver.org
    You can find the Semantic Versioning spec at this URL. It’s very short and easy to follow.

    View Slide

  59. PUBLIC API
    Remember TomDoc?
    The hardest part of implementing SemVer is defining a public API for your project. Without a
    public API that tells people what classes/methods/etc they can and cannot use, it is
    impossible to tell users how those things change over time. Remember TomDoc? If you use
    TomDoc and the Public/Internal/Deprecated designators, you can easily define your public
    API without a lot of extra work. So do that.

    View Slide

  60. 2.4.3
    major minor patch
    In SemVer, there are three numbers that comprise the version number. Major, minor, and
    patch.

    View Slide

  61. MAJOR
    backwards incompatible
    big changes
    The major version number must be incremented anytime the public API changes in a
    backwards incompatible way. If you’re a responsible software developer, you don’t want this
    to happen very often. Maintaining backwards compatibility is a big part of not screwing over
    your users.

    View Slide

  62. MINOR
    backwards compatible
    new functionality
    big internal changes
    may contain bug fixes
    The minor version must be incremented when new functionality is added to the public API.
    These changes must always be backwards compatible.

    View Slide

  63. PATCH
    backwards compatible
    bug fixes only
    The patch version must be incremented if bugs are fixed to bring the code back into line with
    the documentation. These must always be backwards compatible and must not change the
    public API in any way.

    View Slide

  64. gem "gollum", "~> 2.4"
    BUNDLER
    If you follow these rules, you can avoid dependency hell in your project by using Bundler’s
    pessimistic version constraint operator. This rule means that any version >= 2.4.0 and <
    3.0.0 will satisfy the requirement.

    View Slide

  65. If you’re worried about large version numbers, you can relax. They’re numbers. It’s not like
    they’re going to run out.

    View Slide

  66. USE VERSION NUMBERS TO
    CONVEY MEANING
    KEY CONCEPT
    Why bother with three part version numbers if you’re not going to convey consistent meaning
    with them? You may as well just use a single incrementing number if that’s the case. If you
    follow SemVer you can save yourself from dependency hell.

    View Slide

  67. SUMMARY
    --28 MINUTES--

    View Slide

  68. Is your system broken
    down into small
    manageable pieces?
    RELENTLESS
    MODULARIZATION

    View Slide

  69. Are you wasting time
    because of too much or
    too little planning?
    README DRIVEN
    DEVELOPMENT

    View Slide

  70. Are you fighting your gemspec
    or working with it?
    RAKEGEM

    View Slide

  71. Is your documentation
    aimed at the right
    audience?
    TOMDOC

    View Slide

  72. Does your versioning
    scheme help or hinder?
    SEMANTIC
    VERSIONING

    View Slide

  73. gracias
    @mojombo
    Join me for
    free beers
    at the
    GitHub Drinkup
    tonight!

    View Slide