Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How shopify sharded rails

Camilo Lopez
February 24, 2014

How shopify sharded rails

Talk presented in bigrubyconf 2014

Camilo Lopez

February 24, 2014
Tweet

Other Decks in Technology

Transcript

  1. You might be able to buy your way out Normally

    just MOAR CPUs will do Monday, February 24, 14
  2. Tune and cache Worked for us (tm) for a long

    time Monday, February 24, 14
  3. Why did we shard? If it does sound like a

    pain Monday, February 24, 14
  4. “InnoDB tries to limit thread concurrency allowing only i n

    n o d b _ t h r e a d _ c o n c u r r e n c y threads to run inside InnoDB kernel at the same time.” Monday, February 24, 14
  5. 81 2011 2012 •2xIntel E5640 2.67GHz •192GB of RAM •2

    x 300GB OCZ Z- Drive R4 PCIe SSDs Monday, February 24, 14
  6. 81 2011 2012 • 4 x Intel E5-4650 2.7 GHz

    •256GB of RAM •2 x 600GB OCZ Z-Drive •2xIntel E5640 2.67GHz •192GB of RAM •2 x 300GB OCZ Z- Drive R4 PCIe SSDs Monday, February 24, 14
  7. Failure isolation If a shard fails, it’s still bad, but

    not as bad as if the only DB fails Monday, February 24, 14
  8. No mixing of shards in the same transaction We can

    get around it, and implementing distributed transactions is not trivial. Monday, February 24, 14
  9. No nested shard selection We use blocks to define shard

    contexts, but we also want developers to not have to think about it Monday, February 24, 14
  10. Had to happen “live” Product developers were pushing features all

    the time there was no “stop and shard” moment 45 Monday, February 24, 14
  11. Most things are scoped to a shop Yay! easy shop_id

    is the sharding key Monday, February 24, 14
  12. Add shard_id to shop We did this before we had

    any shards at all Monday, February 24, 14
  13. AUTO_INCR does not work anymore Ids will not be unique

    across DBs Monday, February 24, 14
  14. Works well Ran in production for only one table that

    was safe- ish to screw up Monday, February 24, 14
  15. next_id = auto_increment_offset + N × auto_increment_increment N is a

    member of 1,2,3,4... Monday, February 24, 14
  16. Within the block sharded models use the current shard connection

    Sharded models include Sharding::Concern, have shop_id column. Monday, February 24, 14
  17. Started GET "/" for 127.0.0.1 at 2014-02-19 18:53:05 +0000 Processing

    by ShopController#index as */* [Sharding::MissingShard] SELECT `products`.* FROM `products`*application Completed 302 Found in 175.2ms Monday, February 24, 14
  18. Started GET "/" for 127.0.0.1 at 2014-02-19 18:02:45 +0000 Processing

    by ShopController#index as */* ********************************* Exception Sharding::MissingShardError: MissingShardError: SELECT `products`.* FROM `products` Completed 500 Internal Server Error in 49.9ms Monday, February 24, 14
  19. Took months Devs were writing all sorts of new features

    while we worked on sharding Monday, February 24, 14
  20. Per shop shared lock Taken by the query across shard

    methods as they iterate, important for things that go across shards Monday, February 24, 14
  21. shop.is_locked No need to reach to a locking server, just

    HTTP 503 if set Monday, February 24, 14
  22. T0 T1 (+100secs) T2 T3 T4 T5 Take global lock

    or bail, shop.is_locked = true Monday, February 24, 14
  23. T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Take

    global lock or bail, shop.is_locked = true Monday, February 24, 14
  24. T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Jobs

    have released shared locks Take global lock or bail, shop.is_locked = true Monday, February 24, 14
  25. T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,

    copy, verify Jobs have released shared locks Take global lock or bail, shop.is_locked = true Monday, February 24, 14
  26. T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,

    copy, verify Change shop.shard_id Jobs have released shared locks Take global lock or bail, shop.is_locked = true Monday, February 24, 14
  27. T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,

    copy, verify Change shop.shard_id Jobs have released shared locks shop.is_locked = false Take global lock or bail, shop.is_locked = true Monday, February 24, 14
  28. T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,

    copy, verify Change shop.shard_id Jobs have released shared locks shop.is_locked = false Take global lock or bail, shop.is_locked = true Monday, February 24, 14
  29. Take shared locks while iterating The user decides what to

    do with locked shops, ignore/raise Monday, February 24, 14
  30. They also ignore data left orphaned It will get deleted,

    but might be hours Monday, February 24, 14
  31. r = model.unscoped Sharding.find_in_batches_on_each_shard(r, on_lock::ignore) do |recs| recs.each do |record|

    reserialize_record(record, params[:fields]) progress.tick end end Monday, February 24, 14
  32. If someone absolutely must run custom SQL Take the global

    lock. Also, do not do this. Monday, February 24, 14
  33. Nov 2012 Dec-Jan March? May Jul-Oct Oct-Nov Start DB-charmer, constraints

    and verifiers Work, work, work Look for vendors? #dunnolol Move shops Cybermonday 2012 Query across DBs OMG Cybermonday Monday, February 24, 14
  34. Sharding checklist • How to slice data • Primary key

    generation • Connection switching • Query across shards • Rebalancing 130 Monday, February 24, 14