Pro Yearly is on sale from $80 to $50! »

How to Dev&Ops Internal PaaS

Ecb3acc2d246962361a4f8b3f7a6dd12?s=47 taichi nakashima
June 29, 2015
3.8k

How to Dev&Ops Internal PaaS

Ecb3acc2d246962361a4f8b3f7a6dd12?s=128

taichi nakashima

June 29, 2015
Tweet

Transcript

  1. HOW TO DEV&OPS INTERNAL PAAS

  2. TAICHI NAKASHIMA @deeeet @tcnksm

  3. INTERNAL PAAS? = PaaS for Rakuten engineers

  4. ONLY FOR TEST? = No. It receives production requests

  5. WHY PAAS? = Fast app experimentation and iteration with PROD-grade

  6. WHY PAAS? = You don’t need to prepare servers by

    yourself
  7. WHY PAAS? = You don’t need to provision servers by

    yourself
  8. WHY PAAS? = You don’t need to prepare DBs by

    yourself
  9. WHY PAAS? = You can scale your app by *one

    command*
  10. WHY PAAS? = You can focus on development, not deployment

  11. WHY INTERNAL PAAS? = Easy to connect with other internal

    service
  12. WHY INTERNAL PAAS? = Instant support when something happen

  13. WHY INTERNAL PAAS? (From organizational point of view) = You

    can reduce duplicated tooling by different teams
  14. HOW LARGE? How many request? servers? language?

  15. 16000 req/sec. All application requests

  16. 2500 instances 1400 (PROD) + 700 (STG) + 400 (DEV)

  17. 4300 VMs 2800 (PROD) + 1200 (STG) + 300 (DEV)

  18. +300 VMs/mon. Growth forecasting

  19. 4 languages support Ruby, Node.js, Java, PHP

  20. 3 DB services Redis, MongoDB, Clustrix

  21. 100 Redis clusters 230 Instances

  22. 40 components Components (Roles) to run PaaS

  23. 320 chef recipes `ls cookbooks/*/recipes | wc -l`

  24. 8 Engineers Dev & Ops, From 7 Countries

  25. HOW TO DEV&OPS INTERNAL PAAS

  26. HOW TO DEV&OPS INTERNAL PAAS

  27. None
  28. Router API Health Check Messaging DBs Apps

  29. DEV FLOW RELEASE FLOW

  30. DEV FLOW RELEASE FLOW

  31. Create Ticket on JIRA Write code Write Chef cookbook Test

    on LAB Create PR (Git-Flow) Review
  32. DEV FLOW RELEASE FLOW

  33. Assign release manager Collect all JIRA tickets Write internal blog

    CanaryRelease Release
  34. 1 release for 1 week DEV (2day) , STG (2day)

    , PROD(3day)
  35. HOW TO RELEASE? = Chef + Capistrano

  36. RELEASE 1 SERVER

  37. Service-out Run Chef solo Run Serverspec Service-in

  38. Stop Load-Balancing Disable Health Check Stop monit Service-out Run Chef

    solo Run Serverspec Service-in Start monit Enable Health Check Start Load-Balancing
  39. /etc/service-out /etc/service-in Service-out Run Chef solo Run Serverspec Service-in

  40. Every server has same startup/stop scripts = workflow is same

    = automation is easy
  41. RELEASE X SERVERS

  42. cap service-in cap service-out cap setup-role Service-out X servers Run

    Chef solo X servers Run Serverspec X servers Service-in X servers
  43. Role A Role B Role C Operation 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA

    170.20.20.24.RoleA 170.20.20.25.RoleA 170.20.20.26.RoleA 170.20.20.27.RoleA VMLIST
  44. cap service-out 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA 170.20.20.24.RoleA 170.20.20.25.RoleA 170.20.20.26.RoleA 170.20.20.27.RoleA VMLIST

    Operation Role A Role B Role C Parallel execution
  45. cap setup-role Operation Parallel execution 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA 170.20.20.24.RoleA 170.20.20.25.RoleA

    170.20.20.26.RoleA 170.20.20.27.RoleA VMLIST Role A Role B Role C
  46. cap service-in Role A Role B Role C Operation 170.20.20.21.RoleA

    170.20.20.22.RoleA 170.20.20.23.RoleA 170.20.20.24.RoleA 170.20.20.25.RoleA 170.20.20.26.RoleA 170.20.20.27.RoleA VMLIST Parallel execution
  47. cap service-out Operation Parallel execution 170.20.20.31.RoleB 170.20.20.32.RoleB 170.20.20.33.RoleB 170.20.20.34.RoleB 170.20.20.35.RoleB

    170.20.20.36.RoleB 170.20.20.37.RoleB VMLIST Role A Role B Role C
  48. cap service-out 170.20.20.21.RoleA VMLIST Operation Role A Role B Role

    C Start from Canary
  49. HOW TO DEV&OPS INTERNAL PAAS

  50. LOGGING MONITORING ALERT HANDLING SUPPORT IAAS

  51. LOGGING MONITORING ALERT HANDLING SUPPORT IAAS

  52. 700GB/day logs All logs produced in PaaS

  53. LOGGING IN PAAS? = Application logs + Component logs

  54. APPLICATION LOG ? = PaaS should provide user the way

    to debug
  55. Instant logs Midterm logs Longterm logs Real time 1-2 weeks

    - 6 month
  56. Router API Health Check Messaging DBs Apps Instant log

  57. Log Server Apps Object Storage Instant log Midterm log Longterm

    log
  58. Log Server Apps Instant log Midterm log Hadoop (BigData team)

    Analytics
  59. Log Server Apps Instant log Midterm log Splunk  Dashboard

  60. COMPONENT LOG ? = Log which we use for debug

    PaaS itself
  61. Log Server Object Storage

  62. Log Server Object Storage We can debug CF here

  63. Log Server Object Storage GlusterFS LeoFS

  64. Log Server Object Storage GlusterFS

  65. LOGGING METRICS ALERT HANDLING SUPPORT IAAS

  66. OpenTSDB, Pandra FMS

  67. LOGGING METRICS ALERT HANDLING SUPPORT IAAS

  68. 1 week, 24H charge Primary & Sub admin

  69. 2500 ✉/day MAX. Need to fix…

  70. LOGGING METRICS ALERT HANDLING SUPPORT IAAS

  71. JIRA, HipChat Instant support is one of *good* point of

    Internal PaaS
  72. LOGGING METRICS ALERT HANDLING SUPPORT IAAS

  73. IAAS Operating PaaS also means operating IaaS

  74. vSphere

  75. HOW TO BOOT SERVERS? = Internal tool like terraform

  76. Role A vSphere Operation rvc create -c rvc.yml 170.20.21.RoleA RoleA:

    cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
  77. Role A vSphere Operation VMLIST rvc create -c rvc.yml 170.20.21.RoleA

    RoleA: cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
  78. Role A vSphere Operation rvc create -c rvc.yml 170.20.21.RoleA RoleA:

    cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
  79. Role A vSphere Operation rvc create -c rvc.yml 170.20.22.RoleA RoleA:

    cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
  80. Role A vSphere Operation rvc create -c rvc.yml 170.20.23.RoleA RoleA:

    cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
  81. cap setup-role Role A Operation vSphere 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST

  82. cap setup-role Role A Operation vSphere 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST

  83. Easy to boot & setup servers = If there is

    *physical resource*
  84. FUTURE? = We are moving to *version 2*

  85. BE GOPHER CloudFoundry moves from Ruby to Golang

  86. NO FORK Everything goes to upstream

  87. BE OPEN Building tool as OSS

  88. NO MORE TOO MUCH ✉ Planing to use Pagerduty +

    Riemann
  89. Log Server Object Storage GlusterFS LeoFS

  90. Object Storage LeoFS Kafka

  91. MORE FLEXIBLE LOG STACK Planning to use Apache Kafka

  92. NEW METRICS STACK Planning to use InfluxDB + Grafana

  93. CONTAINER Planning to support Docker

  94. MORE HA Planning to have a ChaosMonkey

  95. NEW IAAS Migrating to OpenStack

  96. NEW IAAS Planning to Hybrid Cloud

  97. WE HAVE MANY CHALLENGES

  98. WE ARE HIRING http://corp.rakuten.co.jp/careers/experienced/

  99. @deeeet