Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GOTO Copenhagen 2017: Shopify’s Architecture to Handle 80K RPS Sales

GOTO Copenhagen 2017: Shopify’s Architecture to Handle 80K RPS Sales

Video: https://www.youtube.com/watch?v=N8NWDHgWA28

What do you do when some of the most ubiquitous celebrity personalities launch products on your platform, driving tens of thousands of requests per second? You pull up your sleeves and architect for it. Throughout the past decade, Shopify's infrastructure has evolved to serve some of the largest online sales on the planet. In this talk, we dive into our multi-tenant architecture that allows us to failover between regions with zero downtime, move shops between shards, minimize the blast radius of catastrophes, as well as throttling and serving cache hits out of the load-balancers. We'll walk through how this architecture served us beautifully to minimize risk during our on-going, gradual migration to the Cloud.

Simon Hørup Eskildsen

October 03, 2017
Tweet

More Decks by Simon Hørup Eskildsen

Other Decks in Technology

Transcript

  1. Shopify is handling some of the largest sales in the

    world from Kylie Jenner, Kanye, Superbowl, and others
  2. — Tobi Lütke, CEO in internal essay on why we

    optimize for flash sales “We learned to absorb these shocks and become stronger as a result. [..] The school of hard knocks has taught us well.”
  3. 500K $5.8B Merchants powered Processed Q2, 2017 80K 40+ Peak

    RPS Daily deploys Rails 2000+ Ruby on Rails since 2006 Employees
  4. ISP ISP ISP ISP ISP ISP ISP ISP ISP ISP

    Region A BGP ANNOUNCE 23.227.38.0/24 BGP ANNOUNCE 23.227.38.0/24 Region B walrusser.myshopify.com 23.227.38.64
  5. OpenResty allows Lua scripting of your load balancers, it’s been

    one of the most impactful additions to our stack in recent memory https://github.com/openresty/openresty Nginx with OpenResty Rule Banner Kafka Logging Edgecache Checkout Throttle
  6. worker_processes 1; error_log logs/error.log; events { worker_connections 1024; } http

    { server { listen 8080; location / { default_type text/html; content_by_lua ' ngx.say("<p>hello, world</p>") '; } } }
  7. Bot squasher analyzes the Kafka stream of incoming requests to

    ban bots with a rule banner module Nginx with OpenResty Rule Banner Kafka Bot Squasher Kafka Logger POST /checkout BAN 23.227.38.178
  8. Nginx with OpenResty Edgecache Memcached GET /collections/walruses HIT Edgecache can

    serve full page cache hits out of the load-balancers in microseconds Web Process MISS FILL
  9. Nginx with OpenResty Checkout Throttle GET /checkout Queue /wait_area /checkout

    Throttle Checkout Throttle throttles the number of customers in the processing heavy checkout path
  10. shop1 shop4 shop9 shop17 shop72 Data in Region A shop3

    shop72 shop92 shop18 shop64 shop22 shop88 shop0 sho52 shop23 Pod 14 Pod 2 Pod 7
  11. Pod 14 Each Pod in Region A Pod 2 Pod

    7 MySQL Redis Memcache MySQL Redis Memcache MySQL Redis Memcache Cron Cron Cron
  12. Pod 14 Pod 2 Pod 7 MySQL Redis Memcache MySQL

    Redis Memcache MySQL Redis Memcache Cron Cron Cron Shared Workers
  13. Pod 14 Pod 2 Pod 7 MySQL Redis Memcache MySQL

    Redis Memcache MySQL Redis Memcache Cron Cron Cron Shared Load Balancing
  14. shop1 shop4 shop9 shop17 shop72 Pod Balancer shop3 shop72 shop92

    shop18 shop64 shop22 shop88 shop0 shop52 shop23 Pod 14 Pod 2 Pod 7
  15. shop1 shop4 shop9 shop17 shop72 Pod Balancer shop3 shop72 shop92

    shop18 shop64 shop22 shop88 shop0 shop52 shop23 Pod 14 Pod 2 Pod 7
  16. shop1 shop4 shop9 shop17 shop72 Pod Balancer shop3 shop72 shop92

    shop18 shop64 shop22 shop88 shop0 shop52 shop23 Pod 14 Pod 2 Pod 7 shop98
  17. shop1 shop4 shop9 shop17 shop72 Pod Balancer shop3 shop72 shop92

    shop18 shop64 shop22 shop88 shop0 shop52 shop23 Pod 14 Pod 2 Pod 7 shop98 shop99 shop100
  18. shop1 shop4 shop9 shop17 shop72 Pod Balancer shop3 shop72 shop92

    shop18 shop64 shop22 shop88 shop0 shop52 shop23 Pod 14 Pod 2 Pod 7 shop98 shop99 shop100 Pod 74
  19. shop1 shop4 shop9 shop17 shop72 Pod Balancer shop3 shop72 shop92

    shop18 shop64 shop22 shop88 shop0 shop52 shop23 Pod 14 Pod 2 Pod 7 shop98 shop99 shop100 Pod 74
  20. MySQL Redis MySQL Redis COPY SHOP SELECT * FROM products

    WHERE shop_id = 38493 SELECT * from orders WHERE shop_id = 38493 Source Pod 9 Target Pod 23
  21. MySQL Redis MySQL Redis COPY SHOP SELECT * FROM products

    WHERE shop_id = 38493 SELECT * from orders WHERE shop_id = 38493 NEW CHECKOUT INSERT INTO CHECKOUTS … Source Pod 9 Target Pod 23
  22. MySQL Redis Source Pod 9 MySQL Redis Target Pod 23

    COPY SHOP_ID 238 SELECT * FROM products WHERE shop_id = 238 SELECT * from orders WHERE shop_id = 238 Bin Log REPLICATE SHOP_ID 238 CHECKOUT id: 383293
  23. MySQL Redis Source Pod 9 MySQL Redis Target Pod 23

    LOCK SHOP_ID 238 Routing UPDATE SHOP_ID 238 pod_id=23
  24. Traffic Region A Region B Active Pod 7 Inactive Pod

    2 Active Pod 14 Pod 14 Inactive Inactive Active Pod 2 Pod 7 Pod 14 Sorting Hat GET /products Host: sneakershop.com Routing ROUTE sneakershop.com shop238 pod2:B
  25. Traffic Region A Region B Active Pod 7 Pod 2

    Active Pod 14 Pod 14 Inactive Inactive Active Pod 2 Pod 7 Pod 14 Sorting Hat Inactive Pod 2
  26. Traffic Region A Region B Active Pod 7 Pod 2

    Active Pod 14 Pod 14 Inactive Inactive Active Pod 2 Pod 7 Pod 14 Sorting Hat Inactive Pod 2
  27. Update Routing for pod to target region pod2:b -> pod2:a

    Sorting Hat routes requests to target region Disable cron in both regions Fail over MySQL to target region Enable cron in both regions Transfer jobs to target region
  28. Nginx with OpenResty Pauser POST /checkout (during failover) Pauser will

    pause requests in the middle of failovers to avoid serving errors Queue Throttle HTTP 200 (seconds later)
  29. Update Routing for pod to target region pod2:b -> pod2:a

    Sorting Hat routes requests to target region and pause requests Disable cron in both regions Fail over MySQL to target region Enable cron in both regions Resume requests Transfer jobs to target region
  30. shop1 shop4 shop9 shop17 shop72 Region A shop3 shop72 shop92

    shop18 shop64 shop22 shop88 shop0 sho52 shop23 Cloud Region C