Scalable Deployments - How we deploy Rails app to 150+ hosts in a minute

Scalable Deployments - How we deploy Rails app to 150+ hosts in a minute

RubyKaigi edition

626ca235e8dab778c5bad6fc10e94ad8?s=128

Sorah Fukumori

September 19, 2014
Tweet

Transcript

  1. Scalable Deployments How we deploy Rails app to 150+ hosts

    in a minute
  2. AD ✮鼅 㕼  傈 ˋؔٝٓ؎ٝ ו׍׵ַ傈 劤䨌 㕼 

        !-*/&吳䒭⠓爡幪靼ؼؕٔؒؔؿ؍أ ! ˑֶ겗הז׷8FC؟٦ؽأ׾寸׭׵׸׋ؘُٖٖ٦ءّٝך⚥ד ꣲ歲תד넝鸞⻉׾㔳׷ثُ٦صؚٝغزٕ˒ ! ⚺⪵-*/&吳䒭⠓爡  ㉏겗⡲䧭涯ꆃ⹛暟㕦 $PPLQBE http://isucon.net/
  3. OK

  4. Scalable Deployments How we deploy Rails app to 150+ hosts

    in a minute
  5. TODAY I TALK ABOUT How Cookpad performs deployments

  6. TOPICS NOT INCLUDED Rails Continuous Delivery App servers’ auto scaling

  7. LINKS Cookpad's deployment and auto scaling Continuous Delivery in Cookpad

    https://speakerdeck.com/mirakui/cookpads-deployment-and-auto-scaling https://speakerdeck.com/takai/continuous-delivery-in-cookpad
  8. Cookpad Inc. Dev-Infra group (Ꟛ涪㛇湍) Ruby committer ׉׵כ TPSBIFS 

    Shota Fukumori @sora_h ! sorah
  9. DEPLOYMENTS SO,

  10. quoted from New Oxford American Dictionary 3rd edition © 2010,

    2012 by Oxford University Press photo: https://www.flickr.com/photos/thenationalguard/4401592829 deploy |diˈploi| verb [ with obj. ] ! move (troops) into position for military
  11. Software deployment is all of the activities that make a

    software system available for use. ! The general deployment process consists of several interrelated activities with possible transitions between them. These activities can occur at the producer side or at the consumer side or both. Because every software system is unique, the precise processes or procedures http://en.wikipedia.org/wiki/Software_deployment
  12. DEPLOYMENTS

  13. We perform deployment of the following Rails app: +----------------------+-------+-------+---------+---------+-----+-------+ |

    Name | Lines | LOC | Classes | Methods | M/C | LOC/M | +----------------------+-------+-------+---------+---------+-----+-------+ | Controllers | 41964 | 33824 | 436 | 3397 | 7 | 7 | | Helpers | 13296 | 10950 | 10 | 1289 | 128 | 6 | | Models | 87626 | 69239 | 1530 | 7604 | 4 | 7 | | Mailers | 300 | 240 | 11 | 26 | 2 | 7 | | Javascripts | 38740 | 33240 | 34 | 4789 | 140 | 4 | | Libraries | 56189 | 46375 | 532 | 4371 | 8 | 8 | | Async_view specs | 247 | 212 | 0 | 0 | 0 | 0 | | Controller specs | 55098 | 45557 | 7 | 117 | 16 | 387 | | Feature specs | 36807 | 30226 | 0 | 165 | 0 | 181 | | Helper specs | 3598 | 2956 | 0 | 7 | 0 | 420 | | Lib specs | 21636 | 18095 | 27 | 124 | 4 | 143 | | Mailer specs | 306 | 251 | 0 | 0 | 0 | 0 | | Policy specs | 1594 | 1302 | 0 | 0 | 0 | 0 | | Request specs | 28698 | 24526 | 0 | 11 | 0 | 2227 | | Routing specs | 648 | 523 | 0 | 0 | 0 | 0 | | View specs | 619 | 508 | 0 | 2 | 0 | 252 | | Worker specs | 862 | 715 | 0 | 1 | 0 | 713 | +----------------------+-------+-------+---------+---------+-----+-------+ | Total | 388228 | 318739 | 2587 | 21903 | 8 | 12 | +----------------------+-------+-------+---------+---------+-----+-------+
  14. +----------------------+-------+-------+---------+ | Name | Lines | LOC | Classes |

    +----------------------+-------+-------+---------+ | Controllers | 41964 | 33824 | 436 | | Helpers | 13296 | 10950 | 10 | | Models | 87626 | 69239 | 1530 | | Mailers | 300 | 240 | 11 | | Javascripts | 38740 | 33240 | 34 | | Libraries | 56189 | 46375 | 532 | | Controller specs | 55098 | 45557 | 7 | | Feature specs | 36807 | 30226 | 0 | | Helper specs | 3598 | 2956 | 0 | | Lib specs | 21636 | 18095 | 27 |
  15. | Controller specs | 55098 | 45557 | 7 |

    | Feature specs | 36807 | 30226 | 0 | | Helper specs | 3598 | 2956 | 0 | | Lib specs | 21636 | 18095 | 27 | | Mailer specs | 306 | 251 | 0 | | Policy specs | 1594 | 1302 | 0 | | Request specs | 28698 | 24526 | 0 | | Routing specs | 648 | 523 | 0 | | View specs | 619 | 508 | 0 | | Worker specs | 862 | 715 | 0 | +----------------------+-------+-------+---------+ | Total | 388228 | 318739 | 2587 +----------------------+-------+-------+---------+
  16. 140 servers 10 times / day ! " (peak)

  17. RULES ON DEPLOYMENT Deploy revisions which CI build passes Only

    during working time After deployment, monitor errors for an hour Rollback if error rate increase, or any trouble
  18. Our deployment was: CI git repo developer # $ "

    % pass tag check deploy & merge ' build
  19. Our deployment was: App App App : Deploy Server ssh+rsync

    capistrano 2 ( )
  20. Deploy via Chat # $ ( check " deploy!

  21. How long time spent for deployment? CI git repo developer

    # $ " % & ' 10 min 1..5 min 10 min
  22. How long time spent for deployment? 15…20 min

  23. PROBLEMS

  24. PROBLEMS Capistrano 2 with complicated super historical deploy script !

    Seemed time to re-new
  25. $ tree config config ├── cutty_deploy.rb ├── deploy │ ├──

    ***.rb │ ├── production.rb │ ├── production_test.rb │ ├── production_***.rb │ ├── production_***_test.rb │ ├── (snip) │ ├── rails41.rb │ ├── ruby210.rb │ ├── staging.rb │ ├── staging_***.rb │ └── ***_test.rb ├── deploy_support │ ├── bundler_capistrano.rb │ ├── chat_notification.rb │ ├── deploy_utils.rb │ └── rsync_with_remote_cache.rb :
  26. $ wc -l config/cutty_deploy.rb \ config/deploy/* \ config/deploy_support/* ! 2595

    total
  27. PROBLEMS SSH is slow High CPU usage on deployment Sometime

    Fails
  28. PROBLEM: Sometime Fails

  29. How long time spent for deployment? CI git repo developer

    # $ " % & ' 10 min 1..5 min 10 min
  30. How long time spent for deployment? CI git repo developer

    "
  31. How long time spent for deployment? CI git repo developer

    x 3 min " retry
  32. How long time spent for deployment? CI git repo developer

    x 3 min " retry 3 min … x
  33. How long time spent for deployment? CI git repo developer

    # " % ' 10 min 1..5 min 10 min ? $ &
  34. How long time spent for deployment? CI git repo developer

    # $ " % & ' 10 min 1..5 min 10..20 min+
  35. My team “dev-infra” aims to: Improve developers’ productivity ! Keep

    development fast Maintain & improve test environment etc
  36. WE HAVE TO IMPROVE of course,

  37. IMPROVEMENT PLANS Upgrade to Capistrano 3? ! It has better

    SSH handling, but still depends on SSH. SSH is slow.
  38. CREATE, NEW TOOL! Create new tool that uses another way

    for deployments!
  39. INTRODUCING * sorah/mamiya (pronounce like mar-me-ya)

  40. MAMIYA use Serf for orchestration use Amazon S3 for file

    distribution (by default) compatible directory structure with Capistrano
  41. SERF? * hashicorp/serf Orchestration tool ! Decentralized, fault-tolerant, highly available

    Uses Gossip protocol (SWIM)
  42. GOSSIP PROTOCOL? A gossip protocol is a style of computer-to-

    computer communication protocol inspired by the form of gossip seen in social networks. http://en.wikipedia.org/wiki/Gossip_protocol
  43. GOSSIP PROTOCOL: interval: 200ms, total nodes: 8, fanout: 2 =

    event node node node node node node node node e
  44. node node node node node GOSSIP PROTOCOL: node node e

    Receives Event. node 0ms 200ms 400ms 600ms 800ms +
  45. node node node node node GOSSIP PROTOCOL: node node e

    Receives Event. Choose nodes to gossip node 0ms 200ms 400ms 600ms 800ms +
  46. node node node node node GOSSIP PROTOCOL: node node e

    Receives Event. Choose nodes to gossip node 0ms 200ms 400ms 600ms 800ms +
  47. node node node node node GOSSIP PROTOCOL: node node e

    Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +
  48. node node node node node GOSSIP PROTOCOL: node node e

    e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +
  49. node node node node node GOSSIP PROTOCOL: node node e

    e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +
  50. node node node node node GOSSIP PROTOCOL: node node e

    e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +
  51. node node node node node GOSSIP PROTOCOL: node node e

    e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +
  52. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +
  53. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +
  54. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +
  55. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +
  56. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node e e e e e e e e 0ms 200ms 400ms 600ms 800ms +
  57. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node e e e e e e e e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms +
  58. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms +
  59. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms +
  60. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms +
  61. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms +
  62. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known e e 0ms 200ms 400ms 600ms 800ms +
  63. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known e e 0ms 200ms 400ms 600ms 800ms +
  64. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known e e 0ms 200ms 400ms 600ms 800ms +
  65. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known e e 0ms 200ms 400ms 600ms 800ms +
  66. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms +
  67. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms +
  68. node node node node node GOSSIP PROTOCOL: node node e

    e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms + Now all nodes has the event.
  69. SERF Consumes UDP bandwidth proportional to cluster size, but better

    than SSH. * hashicorp/serf
  70. VILLEIN Side-effect gem for Mamiya Simple gem to control `serf`

    from Ruby * sorah/villein
  71. HOW IT WORKS

  72. Terminologies & Concept Master node Agent node Package Storage Steps

  73. Terminologies & Concept Master node has HTTP API to control

    cluster sends requests to agents (via serf) watches agents’ status
  74. Terminologies & Concept Agent node accepts requests from master node

    runs deploy tasks
  75. Terminologies & Concept Deploy Script writes how to “build”, “prepare”,

    and “release”
  76. Terminologies & Concept Package is a tarball of files to

    deploy can be pushed to storage contains Deploy Script
  77. Terminologies & Concept Storage can store Packages used from Agent

    nodes
  78. Terminologies & Concept Step is part of deployment can be

    run separately called remotely
  79. Steps Fetch package from storage Prepare fetched package (bundle install,

    etc) Switch to prepared package (reload, graceful)
  80. 1. CI builds package when passed 2. CI pushes the

    package to storage 3. — Deployment starts — 4. Master sends “prepare” request to Agents 5. Agents fetch package, then prepare Mamiya’s Deploy flow
  81. 6. Master confirms all agents have prepared 7. Master sends

    “switch” request to Agents 8. Agent switches symlinks, then reload app process Mamiya’s Deploy flow
  82. mamiya’s deploy flow storage CI developer , build+push check deploy

    pass prepare app merge " & # - % reload .
  83. Result Removed dependency to slow SSH ! but, more…?

  84. Terminologies & Concept Step is part of deployment can be

    run separately called remotely
  85. ANOTHER GOAL Do preparation before developer say “DEPLOY!”

  86. mamiya’s deploy flow (prepare earlier) storage CI developer , build+push

    check deploy pass prepare app merge " & # . - % reload app
  87. Result (cap→mamiya) Before: 8.4 minutes After: 45 seconds ! for

    110 servers, 11.2x faster!
  88. DEMO

  89. Future Plans Better documentation (soon) Auto-deploy when joining cluster Web

    UI Better error tracking, handling Incremental Packages “master is always deployed”
  90. THANKS * sorah/mamiya questions? @sora_h