Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling GitHub

Scaling GitHub

A month after launching, GitHub hosted one thousand repositories. Three years later, we host over three million. In the same time we've gone from one thousand users to over a million.

This type of scaling presents some interesting technical challenges. I'll dig into our development workflow and how we address concepts like scaling, deployment, code review, and testing.

It also presents some interesting business challenges, too. How you grow your company from three employees, how you work in teams, and how you split your app up into services all help ensure that you'll be able to react to your product's growth.

http://zachholman.com/talk/scaling-github

Zach Holman

January 26, 2012
Tweet

More Decks by Zach Holman

Other Decks in Programming

Transcript

  1. Scaling GitHub
    Scaling GitHub
    Scaling GitHub
    ling GitHub
    Scaling GitHub
    Scaling GitHub
    Scaling GitHub
    Scaling GitHub
    Scaling GitHub
    Scaling github
    SCALING
    GITHUB
    scalin’ githubs
    Scaling GitHub
    Scaling GitHub
    Scaling GitHub
    githubs and
    shit
    Scaling GitHub
    Scaling GitHub
    Scaling Startups
    B=======D~~~~
    Scaling GitHub

    View full-size slide

  2. Two problems.

    View full-size slide

  3. SyntaxError: compile error.
    I’m too hungover to work.
    ORGANIZATIONAL
    TECHNICAL

    View full-size slide

  4. Scaling is people + technology

    View full-size slide

  5. Organizational
    jeez humans are so finicky

    View full-size slide

  6. 0
    250,000
    500,000
    750,000
    1,000,000
    Happiness vs Productivity


    $
    $$$

    View full-size slide

  7. happy employees are
    productive employees

    View full-size slide

  8. productive employees
    are happy employees

    View full-size slide

  9. This isn’t a “management problem”.
    Everyone needs to worry about this.

    View full-size slide

  10. Hiring an employee is the
    most thing
    you can do to your startup.
    T O X I C

    View full-size slide

  11. Hiring an employee is the
    most thing
    you can do to your startup.
    T O X I C
    work slower
    more bugs less features
    worse culture

    View full-size slide

  12. Hiring an employee is the
    most thing
    you can do to your startup.
    EXCITING
    work faster
    fewer bugs more features
    better culture

    View full-size slide

  13. so how can you
    score excitement
    and avoid the toxic?

    View full-size slide

  14. TOXIC EXCITEMENT
    would be a great name for a rock band
    yeah, i know...

    View full-size slide

  15. k
    S
    e
    k 2
    k
    S
    S
    k
    k
    k
    k
    S
    S
    k
    S
    UKeep your employees happy.
    Really happy.

    View full-size slide

  16. Your servers, offices, and ideas are bullshit.
    Worry about your coworkers.

    View full-size slide

  17. EMPLOYEES NEW HIRES
    Know your codebase
    Know your process
    Know your mistakes
    Know your mission
    Don’t know jack
    Know your jokes
    Know your priorities

    View full-size slide

  18. Imprison your employees with happiness and
    nice things and cuddly work practices.

    View full-size slide

  19. GitHub Jail
    work whenever you want
    work however you want
    work on what you want
    health, dental, vision
    paid conference trips
    retirement plans
    solid salaries
    a product people love
    four beers on tap stock

    View full-size slide

  20. get out of the way
    NO MEETINGS
    NO PLANNING SESSIONS
    NO NEED TO BE IN THE OFFICE
    chat, pull requests, email
    MORE DIRECT
    FASTER
    ALWAYS RECORDED

    View full-size slide

  21. This is designed to retain people.
    We’re at 56 employees. We haven’t lost one.
    This is a huge, massive competitive advantage.
    It justifies the extra expense.

    View full-size slide

  22. Communication.

    View full-size slide

  23. Don’t have the server guy who knows everything.
    the billing girl
    the testing dude
    the customer support maven
    the performance czar
    the software licensing file hoarder

    View full-size slide

  24. Don’t have the person who knows everything.

    View full-size slide

  25. Specialization is great,
    but only having one person
    is a synchronous bottleneck.

    View full-size slide

  26. Reduce institutional knowledge.

    View full-size slide

  27. Reduce institutional knowledge.
    wikis
    issues
    chat logs
    pull requests
    {

    View full-size slide

  28. V Every internal GitHub talk
    is automatically recorded,
    uploaded, and viewable to
    every future employee.

    View full-size slide

  29. V ...on a Kinect-powered
    Arduino-based motion-
    detecting portable video
    recording platform.

    View full-size slide

  30. Your new hire is stoked to dive in,
    start reading, and start contributing
    ...so don’t get in their way.

    View full-size slide

  31. Hiring poorly is just as bad
    as losing people.

    View full-size slide

  32. Aim for really great people.

    View full-size slide

  33. WE SELF-STARTERS
    k
    less babysitting, more code

    View full-size slide

  34. k
    S
    e
    k 2
    k
    S
    S
    k
    k
    k
    k
    S
    S
    k
    S
    UKeep your employees happy.
    Really happy.
    (future!)

    View full-size slide

  35. Don’t just market your product;
    market your team and company too.

    View full-size slide

  36. Always think
    about attracting
    good people,
    even if you’re
    not hiring.
    OPEN SOURCE
    CONFERENCES
    TECHNICAL POSTS
    SPONSORSHIPS
    MEETUPS
    TALKS

    View full-size slide

  37. Technical
    robots can be pretty finicky too

    View full-size slide

  38. hubot deploy github to production
    COMPILATION
    CoffeeScript
    SCSS and SASS
    bundles assets
    caches Python dependencies
    compiles Erlang changes
    compiles C changes
    builds static pages
    APP SETUP
    installs gems
    symlink directories
    14 rolling app server restarts
    NOTIFY
    Campfire
    New Relic
    graphite
    fs fs fs fs fs fs fs fs fs fs
    fs fs fs fs fs fs fs fs fs fs
    fe fe fe fe fe fe fe fe fe fe
    fe fe fe fe fe fe fe fe fe fe
    fe fe fe fe fe fe fe fe fe fe
    fs fs fs fs fs fs fs fs fs fs

    View full-size slide

  39. deploys
    current process overview
    multi-server shell commands
    new employee setup
    app bootstrap

    View full-size slide

  40. Automating now will save you way
    more time down the road.

    View full-size slide

  41. Ship early, ship often.
    5x-30x
    deploys per day

    View full-size slide

  42. master = always deployable
    always green tests
    always a safe rollback

    View full-size slide

  43. Limit your deployments
    to staff-only
    to beta users only
    to one server only
    to one app process on one server only

    View full-size slide

  44. @github tweets
    exceptions
    deploys
    deploys

    View full-size slide

  45. everyone loves fancy graphs
    quickly see trends
    quickly see problems
    historical data as basis for alerts

    View full-size slide

  46. METRICS ARE GREAT
    But use them wisely.

    View full-size slide

  47. 162ms
    average overall response time

    View full-size slide

  48. Valueless metric.

    View full-size slide

  49. 59ms
    average API response time
    with 4x throughput of web

    View full-size slide

  50. 23ms
    average raw response time
    with 2x throughput of web

    View full-size slide

  51. The responsiveness is a lie.

    View full-size slide

  52. 199ms
    average browser response time

    View full-size slide

  53. 16,000
    requests in the last week over 4.5s

    View full-size slide

  54. Needed to look at the
    right stuff.

    View full-size slide

  55. throttled google
    googlebot
    2-3x throughput
    3-4x CPU usage
    had
    web requests
    compared to

    View full-size slide

  56. Collect a lot of metrics,
    but make sure they’re
    important metrics.

    View full-size slide

  57. GitHub scale.

    View full-size slide

  58. Everyone has different
    growth patterns.

    View full-size slide

  59. GitHub has had three.

    View full-size slide

  60. Launch
    2008
    Bare metal servers
    2009
    net-shard
    2010
    major github
    infrastructure milestones

    View full-size slide

  61. Launch
    2008
    Hosted on Engine Yard
    10 VMs
    54GB RAM
    shared GFS mount
    one metric shit-ton of caching

    View full-size slide

  62. Bare metal servers
    2009
    Hosted on Rackspace
    16 bare metal servers
    288GB of RAM
    redundant disk storage

    View full-size slide

  63. net-shard
    2010
    networks share a common repository
    rails/rails
    holman/rails github/rails
    +1 commit +30 commits
    classic net-shard
    rails network repo
    ...multiplied 2,600 times
    holman/rails rails/rails github/rails
    fat network, skeleton forks

    View full-size slide

  64. net-shard
    2010
    networks share a common repository
    they also share the same fs and partition
    halves storage requirements
    improves hit rate of kernel disk cache
    speeds up backups
    allows fast forks, merge button, network GC

    View full-size slide

  65. For GitHub, scaling involved a lot of
    predictions of future trends, then
    acting appropriately.

    View full-size slide

  66. Side Projects.

    View full-size slide

  67. A THOUGHT EXPERIMENT:
    Imagine I told you to build...

    View full-size slide

  68. This grew organically, over dozens of
    projects, written by dozens of employees,
    when they felt like it.

    View full-size slide

  69. Figure out how to let this happen. It’s hard.

    View full-size slide

  70. Small hack days can result in
    real, imma-make-us-money impact.

    View full-size slide

  71. Small hack days can also keep your
    developers insanely happy.

    View full-size slide

  72. Small hack days can also lead to
    learning new techniques.

    View full-size slide

  73. Projects and Posts.

    View full-size slide

  74. JENKINS + CAMPFIRE
    github.com/github/janky
    CHAT ROOM ROBOT
    github.com/github/hubot
    OFFICE MUSIC DJ
    github.com/holman/play

    View full-size slide

  75. BLOG: GITHUB IS MOVING TO RACKSPACE
    git.io/jByrlQ
    BLOG: HOW WE MADE GITHUB FAST
    git.io/p5v2Ag
    BLOG: UNICORN
    git.io/77Onfg

    View full-size slide

  76. +
    Technical
    Organizational

    View full-size slide

  77. Continually refine your
    process + workflow.

    View full-size slide

  78. Worry about your
    computers, and worry
    about your humans.

    View full-size slide

  79. ZACH HOLMAN
    zachholman.com/talks
    @holman
    twitter+github:

    View full-size slide