Building infrastructure for our global service

Building infrastructure for our global service

About Cookpad Global's Infra: https://cookpad.com/uk

626ca235e8dab778c5bad6fc10e94ad8?s=128

Sorah Fukumori

January 21, 2017
Tweet

Transcript

  1. 2.

    $ whoami Sorah Fukumori (׉׵כ https://sorah.jp/ | GitHub @sorah |

    Twitter @sora_h Site Reliability Engineer at Cookpad Global Cookpad TechConf 2017 NOC Rubyist, Ruby committer Interests: Site Reliability, Networking, Distributed systems
  2. 3.

    $ whoami Sorah Fukumori (׉׵כ https://sorah.jp/ | GitHub @sorah |

    Twitter @sora_h Site Reliability Engineer at Cookpad Global Cookpad TechConf 2017 NOC Rubyist, Ruby committer Interests: Site Reliability, Networking, Distributed systems
  3. 5.
  4. 6.
  5. 7.

    Wi-Fi • Are you enjoying the internet? • It’s provided

    as best-effort, but let us know via Twitter #CookpadTechConf if you’re having a problem
  6. 8.

    Agenda • Global? • About SRE team • Infra •

    Architecture • Traffic • Relationship with Developers • Plans 2017
  7. 9.
  8. 10.
  9. 11.

    global • https://cookpad.com/uk /es /ar /id /vn /sa … •

    Web / Android app & iOS app • 58 countries, in 15 languages 2/3
  10. 14.
  11. 15.

    SRE team • 9 SRE members in JP • 2

    members in JP are assigned to the global project 1/2
  12. 16.

    SRE team • Also, we have 1 SRE member in

    US • Recently joined! 2/2
  13. 18.

    global infra • No special point to mention. Currently just

    an infrastructure for plain Rails app • Do as usual… for now. 1/3
  14. 19.

    global infra • AWS us-east-1 • Amazon Aurora for MySQL

    • ElastiCache (Redis & memcached) • Ruby on Rails 4.2 on Ruby 2.3 • nginx + unicorn 2/3
  15. 20.

    global infra • App built up from scratch,
 Infra lives

    in the new region • Building better infra than existing one, based on our past experiences with AWS EC2 and VPC in Japan • e.g. JP: CentOS → Ubuntu, US: Ubuntu only • e.g. JP: weird subnetting US: private/public subnets 3/3
  16. 21.

    architecture • It’s basically simple: • ELB • EC2 (nginx)

    • EC2 (Rails, unicorn) • RDS (Aurora for MySQL), ElastiCache (Redis,memd) 1/6
  17. 22.

    architecture • cookpad.com is shared between global service and JP

    service • But app is running on multiple regions…? 2/7
  18. 23.

    " #

  19. 24.

    architecture • Route53 Latency Based Routing
 (ap-northeast-1 or us-east-1) •

    DNS returns IP of closer region from resolver • If a requested service lives in another region, reverse- proxy to the alternate region • Also, terminating TCP/TLS as possible as close from user is better on latency.
 (But serving only in 2 regions are not enough…) 4/7
  20. 26.

    architecture • Rails app servers are capable to autoscaling •

    Using consul + consul-template to apply the latest instance list to configurations • Recent AWS Autoscaling Group (ASG) allows suspending actions by API, so the global relies to ASG
 (JP uses original implementation) 6/7
  21. 27.

    architecture • Monitoring: Zabbix (lives in ap-northeast-1) • ap-northeast-1 connectivity

    is provided using VyOS + IPsec tunnel • Without perfect redundancy… it’s enough by disallowing critical traffic inside the tunnel 7/7
  22. 28.

    web development • Global uses GitHub.com • and CI is

    running on CircleCI.com • (JP uses GitHub Enterprise) • Deploy: capistrano base • Deploy server to run capistrano in us-east-1
 (Latency, poor office internet, … etc)
  23. 30.

    peak traffic • JP is around Valentine’s day • Q.

    Then, when does the peak come into the global? 1/2
  24. 31.

    peak traffic • Various! • The global has several moments

    in a year, which expects large increase in traffic: • Ramadan & Eid al-Adha (esp. MENA, Indonesia) • Christmas • and more 2/2
  25. 32.

    Ramadan • Ramadan is the ninth month of the Islamic

    calendar • Muslims refrain from consuming food during ramadan while fasting from dawn until sunset • They enjoy cooking after sunset • This is the biggest occasion in MENA/Indonesia which expects higher traffic than usual https://en.wikipedia.org/wiki/Ramadan
  26. 33.

    Ramadan Preparation • We’ve survived Ramadan 2015, but we grew

    a lot before Ramadan 2016 than 2015 • So we have to take extra care for expected traffic in 2016.
 We couldn’t think our infra and application could survive the Ramadan without taking any care. 1/2
  27. 34.

    Ramadan Preparation • So here are what we did: •

    DB migration:
 ɹRDS MySQL (standard EBS)
 → Amazon Aurora for MySQL • Capacity: Expanding the target of autoscaling • CDN: Switching to Fastly • App: Giving a lot of performance improvements 2/2
  28. 37.

    Ramadan 2016 • No critical issues, but • Logs coming

    a lot than usual — Disks are getting full early and we had to review the log retention or implement S3 archival • Fixing slow queries were required in higher priority — impact of those became massive than usual
  29. 41.

    DevOps • Team with people having different culture, language, and

    skill • Building good relationships, like by attending developers’ camp • Spending few days with people is good way 1/8
  30. 42.

    DevOps • Dashboards • Grafana (with Zabbix + CloudWatch) to

    share server status • Kibana: Importing SQL slow logs 2/8
  31. 43.
  32. 44.
  33. 45.
  34. 46.

    DevOps • Requests incoming at GitHub issues • Most request

    is many simple operation request… • We have to reduce simple “applications” or operations, by: • delegating permissions to dev • automation • Reduce SRE blockers to enable asynchronous work, because developers are living all the world 6/8
  35. 47.
  36. 48.
  37. 49.

    Plans 2017 • There’s a lot of point to improve

    • Performance • Architecture • Developers’ Productivity • JP has a lot of useful, time to import those into global • Be good with developers (DevOps…!) 1/2
  38. 50.

    Plans 2017 • Better deploy • Docker, ECS (hako) •

    Dynamic staging servers • Delegation to dev • HTTP latency • CDN? • and more! 2/2
  39. 51.

    Conclusion • Building the infrastructure receiving traffic from around the

    world is fun • Team surrounded by people from around the world is also fun Thanks!