Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Rails for Black Friday / Cyber Monday at Shopify

Scaling Rails for Black Friday / Cyber Monday at Shopify

Talk given at ConFoo 2015 on February 18th, 2015 and RailsConf 2015.

Christian Joudrey

February 18, 2015
Tweet

More Decks by Christian Joudrey

Other Decks in Technology

Transcript

  1. scaling rails
    for
    Black Friday Cyber Monday

    View Slide

  2. cjoudrey @

    View Slide

  3. View Slide

  4. View Slide

  5. View Slide

  6. View Slide

  7. View Slide

  8. the stack

    View Slide

  9. nginx unicorn
    • rails 4

    mysql 5.6 (percona)
    ruby 2.1 •

    View Slide

  10. 95 app servers 1,800 unicorn workers
    18 job servers 1,400 job workers

    View Slide

  11. scale?

    View Slide

  12. View Slide

  13. 400,000 reqs/min

    View Slide

  14. 3.7B$ annual GMV
    that’s 7,000$ per min

    View Slide

  15. Black Friday
    Cyber Monday

    View Slide

  16. Black Friday
    Cyber FUNday

    View Slide

  17. View Slide

  18. ~ 600,000 reqs/min

    View Slide

  19. 1 request 1 process
    =

    View Slide

  20. scale++
    ↓ resp. time ↑ workers

    View Slide

  21. page caching

    View Slide

  22. View Slide

  23. View Slide

  24. shopify/cacheable

    View Slide

  25. generational caching

    View Slide

  26. gzip • etag + 304 not modified

    View Slide

  27. class PostsController < ApplicationController
    def index
    response_cache do
    @posts = @shop.posts.paginate(params[:page])
    respond_with(@posts)
    end
    end
    def cache_key_data
    {
    shop_id: @shop.id,
    path: request.path,
    format: request.format,
    params: params.slice(:page),
    shop_version: @shop.version
    }
    end
    end

    View Slide

  28. md5(
    {
    shop_id: 1,
    path: '/posts',
    format: 'text/html',
    params: { page: 2 },
    shop_version: 123
    }.to_s
    )
    GET /posts?page=2

    View Slide

  29. View Slide

  30. sale

    View Slide

  31. query caching

    View Slide

  32. shopify/identity_cache

    View Slide

  33. full model caching

    View Slide

  34. opt-in by design

    View Slide

  35. after_commit expiry

    View Slide

  36. class Product < ActiveRecord::Base
    include IdentityCache
    has_many :images
    cache_has_many :images, :embed => true
    end
    product = Product.fetch(id)
    images = product.fetch_images

    View Slide

  37. class Product < ActiveRecord::Base
    include IdentityCache
    cache_index :shop_id, :handle, :unique => true
    end
    Product.fetch_by_shop_id_and_handle(shop_id, handle)

    View Slide

  38. View Slide

  39. sale

    View Slide

  40. background jobs

    View Slide

  41. webhooks emails
    • fraud detection

    payment processing

    View Slide

  42. View Slide

  43. priority queues
    payment

    default •
    low realtime

    View Slide

  44. throttling

    View Slide

  45. now what?

    View Slide

  46. measure it!
    if it moves...

    View Slide

  47. statsd

    View Slide

  48. shopify/statsd-instrument

    View Slide

  49. Liquid::Template.extend StatsD::Instrument
    Liquid::Template.statsd_measure :render,
    'Liquid.Template.render'

    View Slide

  50. PaymentProcessingJob.statsd_count :perform,
    'PaymentProcessingJob.processed'

    View Slide

  51. View Slide

  52. View Slide

  53. load testing

    View Slide

  54. genghis khan

    View Slide

  55. simulate
    Black Friday Cyber Monday
    before it happens

    View Slide

  56. several times per week

    View Slide

  57. slow queries

    View Slide

  58. # User@Host: shopify[shopify] @ [127.0.0.1]
    # Thread_id: 264419969 Schema: shopify Last_errno: 0 Killed: 0
    # Query_time: 0.150491 Lock_time: 0.000057 Rows_sent: 1 Rows_examined:
    147841 Rows_affected: 0 Rows_read: 147841
    # Bytes_sent: 1214 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0
    # InnoDB_trx_id: FF7021AAA
    # QC_Hit: No Full_scan: No Full_join: No Tmp_table: No
    Tmp_table_on_disk: No
    # Filesort: Yes Filesort_on_disk: No Merge_passes: 0
    # InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000
    # InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000
    # InnoDB_pages_distinct: 475
    SET timestamp=1393385020;
    SELECT `discounts`.* FROM `discounts` WHERE `discounts`.`shop_id` =
    1745470 AND `discounts`.`status` = 'enabled' ORDER BY ISNULL(ends_at) DESC,
    ends_at DESC LIMIT 1

    View Slide

  59. determining
    root cause

    View Slide

  60. https://github.com/newobj/nginx-x-rid-header
    nginx request_id header
    proxy_set_header X-Request-ID "$request_id";
    log_format main '... $request_id'
    step 1

    View Slide

  61. https://gist.github.com/mnutt/566725
    Complete 200 OK in 100ms (Views: 60ms |
    ActiveRecord: 40ms | request_id=bc12813bce...)
    log_process_action
    ActionController::Instrumentation
    step 2

    View Slide

  62. https://github.com/basecamp/marginalia
    User Load (0.3ms) SELECT `users`.* FROM `users`
    WHERE `users`.`id` = 1 LIMIT 1
    /*application:Shopify,
    controller:users,action:show,
    request_id:bc12813bce...*/
    basecamp/marginalia
    step 3

    View Slide

  63. # User@Host: shopify[shopify] @ [127.0.0.1]
    # Thread_id: 264419969 Schema: shopify Last_errno: 0 Killed: 0
    # Query_time: 0.150491 Lock_time: 0.000057 Rows_sent: 1 Rows_examined:
    147841 Rows_affected: 0 Rows_read: 147841
    # Bytes_sent: 1214 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0
    # InnoDB_trx_id: FF7021AAA
    # QC_Hit: No Full_scan: No Full_join: No Tmp_table: No
    Tmp_table_on_disk: No
    # Filesort: Yes Filesort_on_disk: No Merge_passes: 0
    # InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000
    # InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000
    # InnoDB_pages_distinct: 475
    SET timestamp=1393385020;
    SELECT `discounts`.* FROM `discounts` WHERE `discounts`.`shop_id` =
    1745470 AND `discounts`.`status` = 'enabled' ORDER BY ISNULL(ends_at) DESC,
    ends_at DESC LIMIT 1 /*application:Shopify,controller:orders,action:pay,
    request_id:bc12813bce...*/
    profit!

    View Slide

  64. access.log
    rails.log
    slow_query.log
    profit! (2)

    View Slide

  65. bonus!
    background jobs

    View Slide

  66. schema migration
    with zero downtime

    View Slide

  67. soundcloud/lhm

    View Slide

  68. class NewIndexOnOrders < ActiveRecord::Migration
    def self.up
    Lhm.change_table :orders do |m|
    m.add_index [:shop_id, :customer_id]
    end
    end
    def self.down
    #
    end
    end

    View Slide

  69. orders 20140520_orders

    View Slide

  70. insert/delete/update triggers

    View Slide

  71. INSERT INTO ... SELECT ...
    insert/delete/update triggers

    View Slide

  72. async
    caveat

    View Slide

  73. resiliency

    View Slide

  74. A resilient system is one that functions
    with one or more components
    being unavailable or unacceptably slow.
    -

    View Slide

  75. don’t let a
    take you down
    minor dependencies

    View Slide

  76. shopify/toxiproxy
    ☢ redis
    memcached
    sessions
    your app
    toxiproxy

    View Slide

  77. def load_customer
    if customer_id = session[:customer_id]
    @customer = Customer.find_by_id(customer_id)
    end
    end

    View Slide

  78. def load_customer
    if customer_id = session[:customer_id]
    @customer = Customer.find_by_id(customer_id)
    end
    rescue Sessions::DataStoreUnavailable
    @customer = nil
    end

    View Slide

  79. def test_storefront_resilient_to_sessions_down
    Toxiproxy[:sessions_data_store].down do
    get '/'
    assert_response :success
    end
    end

    View Slide

  80. rinse & repeat
    http://www.shopify.com/technology/16906928-building-and-testing-resilient-ruby-on-rails-applications

    View Slide

  81. what about
    resources?
    slow

    View Slide

  82. shard 2 shard 3
    shard 1
    shop 4, 5, 6 shop 7, 8, 9
    shop 1, 2, 3

    View Slide

  83. rails
    request
    shard 2 shard 3
    shard 1
    shop 4, 5, 6 shop 7, 8, 9
    shop 1, 2, 3

    View Slide

  84. rails
    request
    shard 2 shard 3
    shard 1
    shop 4, 5, 6 shop 7, 8, 9
    shop 1, 2, 3

    View Slide

  85. how can we fail fast?

    View Slide

  86. shopify/semian
    smart circuit-breaker

    View Slide

  87. shopify/semian
    Semian.register(:mysql_shard_1, tickets: 5,
    timeout: 0.5,
    error_threshold: 100,
    error_timeout: 10,
    success_threshold: 2)

    View Slide

  88. shopify/semian
    Semian[:mysql_shard_1].acquire do
    # Query the resource
    end

    View Slide

  89. rails
    request
    shard 2 shard 3
    shard 1
    shop 4, 5, 6 shop 7, 8, 9
    shop 1, 2, 3

    View Slide

  90. what else can go wrong?

    View Slide

  91. Shopify
    Shipping rate providers
    Payment gateways
    Fulfillment services
    (FedEX, UPS, USPS, etc..)
    (Stripe, PayPal, etc..)
    (Shipwire, etc…)
    Internal services (MySQL, Memcached, etc..)

    View Slide

  92. manual circuit breakers
    around external
    dependencies

    View Slide

  93. thanks! :)

    View Slide