Slide 1

Slide 1 text

Scalable Deployments How we deploy Rails app to 150+ hosts in a minute

Slide 2

Slide 2 text

AD ✮鼅 㕼 傈 ˋؔٝٓ؎ٝ ו׍׵ַ傈 劤䨌 㕼 !-*/&吳䒭⠓爡幪靼ؼؕٔؒؔؿ؍أ ! ˑֶ겗הז׷8FC؟٦ؽأ׾寸׭׵׸׋ؘُٖٖ٦ءّٝך⚥ד ꣲ歲תד넝鸞⻉׾㔳׷ثُ٦صؚٝغزٕ˒ ! ⚺⪵-*/&吳䒭⠓爡 ㉏겗⡲䧭涯ꆃ⹛暟㕦 $PPLQBE http://isucon.net/

Slide 3

Slide 3 text

OK

Slide 4

Slide 4 text

Scalable Deployments How we deploy Rails app to 150+ hosts in a minute

Slide 5

Slide 5 text

TODAY I TALK ABOUT How Cookpad performs deployments

Slide 6

Slide 6 text

TOPICS NOT INCLUDED Rails Continuous Delivery App servers’ auto scaling

Slide 7

Slide 7 text

LINKS Cookpad's deployment and auto scaling Continuous Delivery in Cookpad https://speakerdeck.com/mirakui/cookpads-deployment-and-auto-scaling https://speakerdeck.com/takai/continuous-delivery-in-cookpad

Slide 8

Slide 8 text

Cookpad Inc. Dev-Infra group (Ꟛ涪㛇湍) Ruby committer ׉׵כ TPSBIFS Shota Fukumori @sora_h ! sorah

Slide 9

Slide 9 text

DEPLOYMENTS SO,

Slide 10

Slide 10 text

quoted from New Oxford American Dictionary 3rd edition © 2010, 2012 by Oxford University Press photo: https://www.flickr.com/photos/thenationalguard/4401592829 deploy |diˈploi| verb [ with obj. ] ! move (troops) into position for military

Slide 11

Slide 11 text

Software deployment is all of the activities that make a software system available for use. ! The general deployment process consists of several interrelated activities with possible transitions between them. These activities can occur at the producer side or at the consumer side or both. Because every software system is unique, the precise processes or procedures http://en.wikipedia.org/wiki/Software_deployment

Slide 12

Slide 12 text

DEPLOYMENTS

Slide 13

Slide 13 text

We perform deployment of the following Rails app: +----------------------+-------+-------+---------+---------+-----+-------+ | Name | Lines | LOC | Classes | Methods | M/C | LOC/M | +----------------------+-------+-------+---------+---------+-----+-------+ | Controllers | 41964 | 33824 | 436 | 3397 | 7 | 7 | | Helpers | 13296 | 10950 | 10 | 1289 | 128 | 6 | | Models | 87626 | 69239 | 1530 | 7604 | 4 | 7 | | Mailers | 300 | 240 | 11 | 26 | 2 | 7 | | Javascripts | 38740 | 33240 | 34 | 4789 | 140 | 4 | | Libraries | 56189 | 46375 | 532 | 4371 | 8 | 8 | | Async_view specs | 247 | 212 | 0 | 0 | 0 | 0 | | Controller specs | 55098 | 45557 | 7 | 117 | 16 | 387 | | Feature specs | 36807 | 30226 | 0 | 165 | 0 | 181 | | Helper specs | 3598 | 2956 | 0 | 7 | 0 | 420 | | Lib specs | 21636 | 18095 | 27 | 124 | 4 | 143 | | Mailer specs | 306 | 251 | 0 | 0 | 0 | 0 | | Policy specs | 1594 | 1302 | 0 | 0 | 0 | 0 | | Request specs | 28698 | 24526 | 0 | 11 | 0 | 2227 | | Routing specs | 648 | 523 | 0 | 0 | 0 | 0 | | View specs | 619 | 508 | 0 | 2 | 0 | 252 | | Worker specs | 862 | 715 | 0 | 1 | 0 | 713 | +----------------------+-------+-------+---------+---------+-----+-------+ | Total | 388228 | 318739 | 2587 | 21903 | 8 | 12 | +----------------------+-------+-------+---------+---------+-----+-------+

Slide 14

Slide 14 text

+----------------------+-------+-------+---------+ | Name | Lines | LOC | Classes | +----------------------+-------+-------+---------+ | Controllers | 41964 | 33824 | 436 | | Helpers | 13296 | 10950 | 10 | | Models | 87626 | 69239 | 1530 | | Mailers | 300 | 240 | 11 | | Javascripts | 38740 | 33240 | 34 | | Libraries | 56189 | 46375 | 532 | | Controller specs | 55098 | 45557 | 7 | | Feature specs | 36807 | 30226 | 0 | | Helper specs | 3598 | 2956 | 0 | | Lib specs | 21636 | 18095 | 27 |

Slide 15

Slide 15 text

| Controller specs | 55098 | 45557 | 7 | | Feature specs | 36807 | 30226 | 0 | | Helper specs | 3598 | 2956 | 0 | | Lib specs | 21636 | 18095 | 27 | | Mailer specs | 306 | 251 | 0 | | Policy specs | 1594 | 1302 | 0 | | Request specs | 28698 | 24526 | 0 | | Routing specs | 648 | 523 | 0 | | View specs | 619 | 508 | 0 | | Worker specs | 862 | 715 | 0 | +----------------------+-------+-------+---------+ | Total | 388228 | 318739 | 2587 +----------------------+-------+-------+---------+

Slide 16

Slide 16 text

140 servers 10 times / day ! " (peak)

Slide 17

Slide 17 text

RULES ON DEPLOYMENT Deploy revisions which CI build passes Only during working time After deployment, monitor errors for an hour Rollback if error rate increase, or any trouble

Slide 18

Slide 18 text

Our deployment was: CI git repo developer # $ " % pass tag check deploy & merge ' build

Slide 19

Slide 19 text

Our deployment was: App App App : Deploy Server ssh+rsync capistrano 2 ( )

Slide 20

Slide 20 text

Deploy via Chat # $ ( check " deploy!

Slide 21

Slide 21 text

How long time spent for deployment? CI git repo developer # $ " % & ' 10 min 1..5 min 10 min

Slide 22

Slide 22 text

How long time spent for deployment? 15…20 min

Slide 23

Slide 23 text

PROBLEMS

Slide 24

Slide 24 text

PROBLEMS Capistrano 2 with complicated super historical deploy script ! Seemed time to re-new

Slide 25

Slide 25 text

$ tree config config ├── cutty_deploy.rb ├── deploy │ ├── ***.rb │ ├── production.rb │ ├── production_test.rb │ ├── production_***.rb │ ├── production_***_test.rb │ ├── (snip) │ ├── rails41.rb │ ├── ruby210.rb │ ├── staging.rb │ ├── staging_***.rb │ └── ***_test.rb ├── deploy_support │ ├── bundler_capistrano.rb │ ├── chat_notification.rb │ ├── deploy_utils.rb │ └── rsync_with_remote_cache.rb :

Slide 26

Slide 26 text

$ wc -l config/cutty_deploy.rb \ config/deploy/* \ config/deploy_support/* ! 2595 total

Slide 27

Slide 27 text

PROBLEMS SSH is slow High CPU usage on deployment Sometime Fails

Slide 28

Slide 28 text

PROBLEM: Sometime Fails

Slide 29

Slide 29 text

How long time spent for deployment? CI git repo developer # $ " % & ' 10 min 1..5 min 10 min

Slide 30

Slide 30 text

How long time spent for deployment? CI git repo developer "

Slide 31

Slide 31 text

How long time spent for deployment? CI git repo developer x 3 min " retry

Slide 32

Slide 32 text

How long time spent for deployment? CI git repo developer x 3 min " retry 3 min … x

Slide 33

Slide 33 text

How long time spent for deployment? CI git repo developer # " % ' 10 min 1..5 min 10 min ? $ &

Slide 34

Slide 34 text

How long time spent for deployment? CI git repo developer # $ " % & ' 10 min 1..5 min 10..20 min+

Slide 35

Slide 35 text

My team “dev-infra” aims to: Improve developers’ productivity ! Keep development fast Maintain & improve test environment etc

Slide 36

Slide 36 text

WE HAVE TO IMPROVE of course,

Slide 37

Slide 37 text

IMPROVEMENT PLANS Upgrade to Capistrano 3? ! It has better SSH handling, but still depends on SSH. SSH is slow.

Slide 38

Slide 38 text

CREATE, NEW TOOL! Create new tool that uses another way for deployments!

Slide 39

Slide 39 text

INTRODUCING * sorah/mamiya (pronounce like mar-me-ya)

Slide 40

Slide 40 text

MAMIYA use Serf for orchestration use Amazon S3 for file distribution (by default) compatible directory structure with Capistrano

Slide 41

Slide 41 text

SERF? * hashicorp/serf Orchestration tool ! Decentralized, fault-tolerant, highly available Uses Gossip protocol (SWIM)

Slide 42

Slide 42 text

GOSSIP PROTOCOL? A gossip protocol is a style of computer-to- computer communication protocol inspired by the form of gossip seen in social networks. http://en.wikipedia.org/wiki/Gossip_protocol

Slide 43

Slide 43 text

GOSSIP PROTOCOL: interval: 200ms, total nodes: 8, fanout: 2 = event node node node node node node node node e

Slide 44

Slide 44 text

node node node node node GOSSIP PROTOCOL: node node e Receives Event. node 0ms 200ms 400ms 600ms 800ms +

Slide 45

Slide 45 text

node node node node node GOSSIP PROTOCOL: node node e Receives Event. Choose nodes to gossip node 0ms 200ms 400ms 600ms 800ms +

Slide 46

Slide 46 text

node node node node node GOSSIP PROTOCOL: node node e Receives Event. Choose nodes to gossip node 0ms 200ms 400ms 600ms 800ms +

Slide 47

Slide 47 text

node node node node node GOSSIP PROTOCOL: node node e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +

Slide 48

Slide 48 text

node node node node node GOSSIP PROTOCOL: node node e e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +

Slide 49

Slide 49 text

node node node node node GOSSIP PROTOCOL: node node e e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +

Slide 50

Slide 50 text

node node node node node GOSSIP PROTOCOL: node node e e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +

Slide 51

Slide 51 text

node node node node node GOSSIP PROTOCOL: node node e e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +

Slide 52

Slide 52 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +

Slide 53

Slide 53 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +

Slide 54

Slide 54 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +

Slide 55

Slide 55 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node 0ms 200ms 400ms 600ms 800ms +

Slide 56

Slide 56 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node e e e e e e e e 0ms 200ms 400ms 600ms 800ms +

Slide 57

Slide 57 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node e e e e e e e e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms +

Slide 58

Slide 58 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms +

Slide 59

Slide 59 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms +

Slide 60

Slide 60 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms +

Slide 61

Slide 61 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms +

Slide 62

Slide 62 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known e e 0ms 200ms 400ms 600ms 800ms +

Slide 63

Slide 63 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known e e 0ms 200ms 400ms 600ms 800ms +

Slide 64

Slide 64 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known e e 0ms 200ms 400ms 600ms 800ms +

Slide 65

Slide 65 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known e e 0ms 200ms 400ms 600ms 800ms +

Slide 66

Slide 66 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms +

Slide 67

Slide 67 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms +

Slide 68

Slide 68 text

node node node node node GOSSIP PROTOCOL: node node e e e e e e e Receives Event. Choose nodes to gossip Fan out. node e Drop received event if it’s known 0ms 200ms 400ms 600ms 800ms + Now all nodes has the event.

Slide 69

Slide 69 text

SERF Consumes UDP bandwidth proportional to cluster size, but better than SSH. * hashicorp/serf

Slide 70

Slide 70 text

VILLEIN Side-effect gem for Mamiya Simple gem to control `serf` from Ruby * sorah/villein

Slide 71

Slide 71 text

HOW IT WORKS

Slide 72

Slide 72 text

Terminologies & Concept Master node Agent node Package Storage Steps

Slide 73

Slide 73 text

Terminologies & Concept Master node has HTTP API to control cluster sends requests to agents (via serf) watches agents’ status

Slide 74

Slide 74 text

Terminologies & Concept Agent node accepts requests from master node runs deploy tasks

Slide 75

Slide 75 text

Terminologies & Concept Deploy Script writes how to “build”, “prepare”, and “release”

Slide 76

Slide 76 text

Terminologies & Concept Package is a tarball of files to deploy can be pushed to storage contains Deploy Script

Slide 77

Slide 77 text

Terminologies & Concept Storage can store Packages used from Agent nodes

Slide 78

Slide 78 text

Terminologies & Concept Step is part of deployment can be run separately called remotely

Slide 79

Slide 79 text

Steps Fetch package from storage Prepare fetched package (bundle install, etc) Switch to prepared package (reload, graceful)

Slide 80

Slide 80 text

1. CI builds package when passed 2. CI pushes the package to storage 3. — Deployment starts — 4. Master sends “prepare” request to Agents 5. Agents fetch package, then prepare Mamiya’s Deploy flow

Slide 81

Slide 81 text

6. Master confirms all agents have prepared 7. Master sends “switch” request to Agents 8. Agent switches symlinks, then reload app process Mamiya’s Deploy flow

Slide 82

Slide 82 text

mamiya’s deploy flow storage CI developer , build+push check deploy pass prepare app merge " & # - % reload .

Slide 83

Slide 83 text

Result Removed dependency to slow SSH ! but, more…?

Slide 84

Slide 84 text

Terminologies & Concept Step is part of deployment can be run separately called remotely

Slide 85

Slide 85 text

ANOTHER GOAL Do preparation before developer say “DEPLOY!”

Slide 86

Slide 86 text

mamiya’s deploy flow (prepare earlier) storage CI developer , build+push check deploy pass prepare app merge " & # . - % reload app

Slide 87

Slide 87 text

Result (cap→mamiya) Before: 8.4 minutes After: 45 seconds ! for 110 servers, 11.2x faster!

Slide 88

Slide 88 text

DEMO

Slide 89

Slide 89 text

Future Plans Better documentation (soon) Auto-deploy when joining cluster Web UI Better error tracking, handling Incremental Packages “master is always deployed”

Slide 90

Slide 90 text

THANKS * sorah/mamiya questions? @sora_h