$30 off During Our Annual Pro Sale. View Details »

Scalable Deployments - How we deploy Rails app to 150+ hosts in a minute

Scalable Deployments - How we deploy Rails app to 150+ hosts in a minute

RubyKaigi edition

Sorah Fukumori

September 19, 2014
Tweet

More Decks by Sorah Fukumori

Other Decks in Programming

Transcript

  1. Scalable Deployments
    How we deploy
    Rails app to
    150+ hosts in a minute

    View Slide

  2. AD
    ✮鼅 㕼

    ˋؔٝٓ؎ٝ ו׍׵ַ傈
    劤䨌 㕼
    !-*/&吳䒭⠓爡幪靼ؼؕٔؒؔؿ؍أ
    !
    ˑֶ겗הז׷8FC؟٦ؽأ׾寸׭׵׸׋ؘُٖٖ٦ءّٝך⚥ד
    ꣲ歲תד넝鸞⻉׾㔳׷ثُ٦صؚٝغزٕ˒
    !
    ⚺⪵-*/&吳䒭⠓爡 ㉏겗⡲䧭涯ꆃ⹛暟㕦 $PPLQBE

    http://isucon.net/

    View Slide

  3. OK

    View Slide

  4. Scalable Deployments
    How we deploy
    Rails app to
    150+ hosts in a minute

    View Slide

  5. TODAY I TALK ABOUT
    How Cookpad performs deployments

    View Slide

  6. TOPICS NOT INCLUDED
    Rails
    Continuous Delivery
    App servers’ auto scaling

    View Slide

  7. LINKS
    Cookpad's deployment and auto scaling
    Continuous Delivery in Cookpad
    https://speakerdeck.com/mirakui/cookpads-deployment-and-auto-scaling
    https://speakerdeck.com/takai/continuous-delivery-in-cookpad

    View Slide

  8. Cookpad Inc.
    Dev-Infra group (Ꟛ涪㛇湍)
    Ruby committer
    ׉׵כ TPSBIFS

    Shota Fukumori
    @sora_h
    ! sorah

    View Slide

  9. DEPLOYMENTS
    SO,

    View Slide

  10. quoted from New Oxford American Dictionary 3rd edition © 2010, 2012 by Oxford University Press
    photo: https://www.flickr.com/photos/thenationalguard/4401592829
    deploy |diˈploi|
    verb [ with obj. ]
    !
    move (troops) into position for military

    View Slide

  11. Software deployment is all of the activities
    that make a software system available for use.
    !
    The general deployment process consists of
    several interrelated activities with possible
    transitions between them. These activities can
    occur at the producer side or at the consumer
    side or both. Because every software system is
    unique, the precise processes or procedures
    http://en.wikipedia.org/wiki/Software_deployment

    View Slide

  12. DEPLOYMENTS

    View Slide

  13. We perform deployment of the following Rails app:
    +----------------------+-------+-------+---------+---------+-----+-------+
    | Name | Lines | LOC | Classes | Methods | M/C | LOC/M |
    +----------------------+-------+-------+---------+---------+-----+-------+
    | Controllers | 41964 | 33824 | 436 | 3397 | 7 | 7 |
    | Helpers | 13296 | 10950 | 10 | 1289 | 128 | 6 |
    | Models | 87626 | 69239 | 1530 | 7604 | 4 | 7 |
    | Mailers | 300 | 240 | 11 | 26 | 2 | 7 |
    | Javascripts | 38740 | 33240 | 34 | 4789 | 140 | 4 |
    | Libraries | 56189 | 46375 | 532 | 4371 | 8 | 8 |
    | Async_view specs | 247 | 212 | 0 | 0 | 0 | 0 |
    | Controller specs | 55098 | 45557 | 7 | 117 | 16 | 387 |
    | Feature specs | 36807 | 30226 | 0 | 165 | 0 | 181 |
    | Helper specs | 3598 | 2956 | 0 | 7 | 0 | 420 |
    | Lib specs | 21636 | 18095 | 27 | 124 | 4 | 143 |
    | Mailer specs | 306 | 251 | 0 | 0 | 0 | 0 |
    | Policy specs | 1594 | 1302 | 0 | 0 | 0 | 0 |
    | Request specs | 28698 | 24526 | 0 | 11 | 0 | 2227 |
    | Routing specs | 648 | 523 | 0 | 0 | 0 | 0 |
    | View specs | 619 | 508 | 0 | 2 | 0 | 252 |
    | Worker specs | 862 | 715 | 0 | 1 | 0 | 713 |
    +----------------------+-------+-------+---------+---------+-----+-------+
    | Total | 388228 | 318739 | 2587 | 21903 | 8 | 12 |
    +----------------------+-------+-------+---------+---------+-----+-------+

    View Slide

  14. +----------------------+-------+-------+---------+
    | Name | Lines | LOC | Classes |
    +----------------------+-------+-------+---------+
    | Controllers | 41964 | 33824 | 436 |
    | Helpers | 13296 | 10950 | 10 |
    | Models | 87626 | 69239 | 1530 |
    | Mailers | 300 | 240 | 11 |
    | Javascripts | 38740 | 33240 | 34 |
    | Libraries | 56189 | 46375 | 532 |
    | Controller specs | 55098 | 45557 | 7 |
    | Feature specs | 36807 | 30226 | 0 |
    | Helper specs | 3598 | 2956 | 0 |
    | Lib specs | 21636 | 18095 | 27 |

    View Slide

  15. | Controller specs | 55098 | 45557 | 7 |
    | Feature specs | 36807 | 30226 | 0 |
    | Helper specs | 3598 | 2956 | 0 |
    | Lib specs | 21636 | 18095 | 27 |
    | Mailer specs | 306 | 251 | 0 |
    | Policy specs | 1594 | 1302 | 0 |
    | Request specs | 28698 | 24526 | 0 |
    | Routing specs | 648 | 523 | 0 |
    | View specs | 619 | 508 | 0 |
    | Worker specs | 862 | 715 | 0 |
    +----------------------+-------+-------+---------+
    | Total | 388228 | 318739 | 2587
    +----------------------+-------+-------+---------+

    View Slide

  16. 140 servers
    10 times / day
    !
    "
    (peak)

    View Slide

  17. RULES ON DEPLOYMENT
    Deploy revisions which CI build passes
    Only during working time
    After deployment, monitor errors for an hour
    Rollback if error rate increase, or any trouble

    View Slide

  18. Our deployment was:
    CI
    git repo
    developer
    #
    $
    "
    %
    pass
    tag
    check deploy
    &
    merge
    '
    build

    View Slide

  19. Our deployment was:
    App
    App
    App
    :
    Deploy Server
    ssh+rsync
    capistrano 2
    (
    )

    View Slide

  20. Deploy via Chat
    # $
    (
    check
    "
    deploy!

    View Slide

  21. How long time spent for deployment?
    CI
    git repo
    developer
    #
    $
    "
    %
    &
    '
    10 min
    1..5 min 10 min

    View Slide

  22. How long time spent for deployment?
    15…20 min

    View Slide

  23. PROBLEMS

    View Slide

  24. PROBLEMS
    Capistrano 2
    with complicated super historical
    deploy script
    !
    Seemed time to re-new

    View Slide

  25. $ tree config
    config
    ├── cutty_deploy.rb
    ├── deploy
    │ ├── ***.rb
    │ ├── production.rb
    │ ├── production_test.rb
    │ ├── production_***.rb
    │ ├── production_***_test.rb
    │ ├── (snip)
    │ ├── rails41.rb
    │ ├── ruby210.rb
    │ ├── staging.rb
    │ ├── staging_***.rb
    │ └── ***_test.rb
    ├── deploy_support
    │ ├── bundler_capistrano.rb
    │ ├── chat_notification.rb
    │ ├── deploy_utils.rb
    │ └── rsync_with_remote_cache.rb
    :

    View Slide

  26. $ wc -l config/cutty_deploy.rb \
    config/deploy/* \
    config/deploy_support/*
    !
    2595 total

    View Slide

  27. PROBLEMS
    SSH is slow
    High CPU usage on deployment
    Sometime Fails

    View Slide

  28. PROBLEM:
    Sometime Fails

    View Slide

  29. How long time spent for deployment?
    CI
    git repo
    developer
    #
    $
    "
    %
    &
    '
    10 min
    1..5 min 10 min

    View Slide

  30. How long time spent for deployment?
    CI
    git repo
    developer "

    View Slide

  31. How long time spent for deployment?
    CI
    git repo
    developer
    x
    3 min
    "
    retry

    View Slide

  32. How long time spent for deployment?
    CI
    git repo
    developer
    x
    3 min
    "
    retry
    3 min …
    x

    View Slide

  33. How long time spent for deployment?
    CI
    git repo
    developer
    #
    "
    %
    '
    10 min
    1..5 min 10 min ?
    $
    &

    View Slide

  34. How long time spent for deployment?
    CI
    git repo
    developer
    #
    $
    "
    %
    &
    '
    10 min
    1..5 min 10..20 min+

    View Slide

  35. My team “dev-infra” aims to:
    Improve developers’ productivity
    !
    Keep development fast
    Maintain & improve test environment
    etc

    View Slide

  36. WE HAVE TO
    IMPROVE
    of course,

    View Slide

  37. IMPROVEMENT PLANS
    Upgrade to Capistrano 3?
    !
    It has better SSH handling,
    but still depends on SSH.
    SSH is slow.

    View Slide

  38. CREATE, NEW TOOL!
    Create new tool that uses another way
    for deployments!

    View Slide

  39. INTRODUCING
    * sorah/mamiya
    (pronounce like mar-me-ya)

    View Slide

  40. MAMIYA
    use Serf for orchestration
    use Amazon S3 for file distribution (by default)
    compatible directory structure with Capistrano

    View Slide

  41. SERF?
    * hashicorp/serf
    Orchestration tool
    !
    Decentralized, fault-tolerant, highly available
    Uses Gossip protocol (SWIM)

    View Slide

  42. GOSSIP PROTOCOL?
    A gossip protocol is a style of computer-to-
    computer communication protocol inspired
    by the form of gossip seen in social
    networks.
    http://en.wikipedia.org/wiki/Gossip_protocol

    View Slide

  43. GOSSIP PROTOCOL:
    interval: 200ms, total nodes: 8, fanout: 2
    = event
    node
    node
    node
    node
    node
    node
    node
    node
    e

    View Slide

  44. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    Receives Event.
    node
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  45. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    Receives Event.
    Choose nodes to gossip
    node
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  46. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    Receives Event.
    Choose nodes to gossip
    node
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  47. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  48. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  49. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  50. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  51. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  52. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  53. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  54. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  55. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  56. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    e
    e
    e
    e
    e
    e
    e
    e
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  57. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    e
    e
    e
    e
    e
    e
    e
    e
    Drop received event if it’s known
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  58. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    e
    Drop received event if it’s known
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  59. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    e
    Drop received event if it’s known
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  60. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    e
    Drop received event if it’s known
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  61. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    e
    Drop received event if it’s known
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  62. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    e
    Drop received event if it’s known
    e
    e
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  63. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    e
    Drop received event if it’s known
    e
    e
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  64. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    e
    Drop received event if it’s known
    e
    e
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  65. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    e
    Drop received event if it’s known
    e
    e
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  66. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    e
    Drop received event if it’s known
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  67. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    e
    Drop received event if it’s known
    0ms 200ms 400ms 600ms 800ms
    +

    View Slide

  68. node
    node
    node
    node
    node
    GOSSIP PROTOCOL:
    node
    node
    e
    e
    e
    e
    e
    e
    e
    Receives Event.
    Choose nodes to gossip
    Fan out.
    node
    e
    Drop received event if it’s known
    0ms 200ms 400ms 600ms 800ms
    +
    Now all nodes has the event.

    View Slide

  69. SERF
    Consumes UDP bandwidth proportional to
    cluster size, but better than SSH.
    * hashicorp/serf

    View Slide

  70. VILLEIN
    Side-effect gem for Mamiya
    Simple gem to control `serf` from Ruby
    * sorah/villein

    View Slide

  71. HOW IT WORKS

    View Slide

  72. Terminologies & Concept
    Master node
    Agent node
    Package
    Storage
    Steps

    View Slide

  73. Terminologies & Concept
    Master node
    has HTTP API to control cluster
    sends requests to agents (via serf)
    watches agents’ status

    View Slide

  74. Terminologies & Concept
    Agent node
    accepts requests from master node
    runs deploy tasks

    View Slide

  75. Terminologies & Concept
    Deploy Script
    writes how to “build”, “prepare”,
    and “release”

    View Slide

  76. Terminologies & Concept
    Package
    is a tarball of files to deploy
    can be pushed to storage
    contains Deploy Script

    View Slide

  77. Terminologies & Concept
    Storage
    can store Packages
    used from Agent nodes

    View Slide

  78. Terminologies & Concept
    Step
    is part of deployment
    can be run separately
    called remotely

    View Slide

  79. Steps
    Fetch package from storage
    Prepare fetched package (bundle install, etc)
    Switch to prepared package (reload, graceful)

    View Slide

  80. 1. CI builds package when passed
    2. CI pushes the package to storage
    3. — Deployment starts —
    4. Master sends “prepare” request to Agents
    5. Agents fetch package, then prepare
    Mamiya’s Deploy flow

    View Slide

  81. 6. Master confirms all agents have prepared
    7. Master sends “switch” request to Agents
    8. Agent switches symlinks, then reload app process
    Mamiya’s Deploy flow

    View Slide

  82. mamiya’s deploy flow
    storage
    CI
    developer
    ,
    build+push
    check deploy
    pass
    prepare
    app
    merge
    "
    &
    #
    -
    %
    reload
    .

    View Slide

  83. Result
    Removed dependency to slow SSH
    !
    but, more…?

    View Slide

  84. Terminologies & Concept
    Step
    is part of deployment
    can be run separately
    called remotely

    View Slide

  85. ANOTHER GOAL
    Do preparation
    before developer say “DEPLOY!”

    View Slide

  86. mamiya’s deploy flow (prepare earlier)
    storage
    CI
    developer
    ,
    build+push
    check deploy
    pass
    prepare
    app
    merge
    "
    &
    #
    . -
    %
    reload app

    View Slide

  87. Result (cap→mamiya)
    Before: 8.4 minutes
    After: 45 seconds
    !
    for 110 servers, 11.2x faster!

    View Slide

  88. DEMO

    View Slide

  89. Future Plans
    Better documentation (soon)
    Auto-deploy when joining cluster
    Web UI
    Better error tracking, handling
    Incremental Packages
    “master is always deployed”

    View Slide

  90. THANKS
    * sorah/mamiya
    questions? @sora_h

    View Slide