global infra • App built up from scratch, Infra lives in the new region • Building better infra than existing one, based on our past experiences with AWS EC2 and VPC in Japan • e.g. JP: CentOS → Ubuntu, US: Ubuntu only • e.g. JP: weird subnetting US: private/public subnets 3/3
architecture • Route53 Latency Based Routing (ap-northeast-1 or us-east-1) • DNS returns IP of closer region from resolver • If a requested service lives in another region, reverse- proxy to the alternate region • Also, terminating TCP/TLS as possible as close from user is better on latency. (But serving only in 2 regions are not enough…) 4/7
architecture • Rails app servers are capable to autoscaling • Using consul + consul-template to apply the latest instance list to configurations • Recent AWS Autoscaling Group (ASG) allows suspending actions by API, so the global relies to ASG (JP uses original implementation) 6/7
web development • Global uses GitHub.com • and CI is running on CircleCI.com • (JP uses GitHub Enterprise) • Deploy: capistrano base • Deploy server to run capistrano in us-east-1 (Latency, poor office internet, … etc)
peak traffic • Various! • The global has several moments in a year, which expects large increase in traffic: • Ramadan & Eid al-Adha (esp. MENA, Indonesia) • Christmas • and more 2/2
Ramadan • Ramadan is the ninth month of the Islamic calendar • Muslims refrain from consuming food during ramadan while fasting from dawn until sunset • They enjoy cooking after sunset • This is the biggest occasion in MENA/Indonesia which expects higher traffic than usual https://en.wikipedia.org/wiki/Ramadan
Ramadan Preparation • We’ve survived Ramadan 2015, but we grew a lot before Ramadan 2016 than 2015 • So we have to take extra care for expected traffic in 2016. We couldn’t think our infra and application could survive the Ramadan without taking any care. 1/2
Ramadan Preparation • So here are what we did: • DB migration: ɹRDS MySQL (standard EBS) → Amazon Aurora for MySQL • Capacity: Expanding the target of autoscaling • CDN: Switching to Fastly • App: Giving a lot of performance improvements 2/2
Ramadan 2016 • No critical issues, but • Logs coming a lot than usual — Disks are getting full early and we had to review the log retention or implement S3 archival • Fixing slow queries were required in higher priority — impact of those became massive than usual
DevOps • Team with people having different culture, language, and skill • Building good relationships, like by attending developers’ camp • Spending few days with people is good way 1/8
DevOps • Requests incoming at GitHub issues • Most request is many simple operation request… • We have to reduce simple “applications” or operations, by: • delegating permissions to dev • automation • Reduce SRE blockers to enable asynchronous work, because developers are living all the world 6/8
Plans 2017 • There’s a lot of point to improve • Performance • Architecture • Developers’ Productivity • JP has a lot of useful, time to import those into global • Be good with developers (DevOps…!) 1/2
Conclusion • Building the infrastructure receiving traffic from around the world is fun • Team surrounded by people from around the world is also fun Thanks!