Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Abril Pro Ruby
Search
Arthur Nogueira Neves
April 26, 2014
Programming
0
340
Abril Pro Ruby
Sharding presentation given at abrilproruby.com/en
Arthur Nogueira Neves
April 26, 2014
Tweet
Share
More Decks by Arthur Nogueira Neves
See All by Arthur Nogueira Neves
Using multiple connections in ActiveRecord
arthurnn
3
960
Rails talk - RubyLightningTalksTO
arthurnn
0
94
Toronto MongoDB User Group
arthurnn
1
85
Other Decks in Programming
See All in Programming
Streamlitで実現できるようになったこと、実現してくれたこと
ayumu_yamaguchi
2
270
ソフトウェア設計とAI技術の活用
masuda220
PRO
25
7.3k
変化を楽しむエンジニアリング ~ いままでとこれから ~
murajun1978
0
660
プロダクトという一杯を作る - プロダクトチームが味の責任を持つまでの煮込み奮闘記
hiliteeternal
0
370
SwiftでMCPサーバーを作ろう!
giginet
PRO
2
220
Understanding Kotlin Multiplatform
l2hyunwoo
0
250
Android 15以上でPDFのテキスト検索を爆速開発!
tonionagauzzi
0
180
Quality Gates in the Age of Agentic Coding
helmedeiros
PRO
1
120
PHPUnitの限界をPlaywrightで補完するテストアプローチ
yuzneri
0
370
Strands Agents で実現する名刺解析アーキテクチャ
omiya0555
1
110
Bedrock AgentCore ObservabilityによるAIエージェントの運用
licux
8
560
decksh - a little language for decks
ajstarks
4
21k
Featured
See All Featured
Code Reviewing Like a Champion
maltzj
524
40k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3k
Navigating Team Friction
lara
188
15k
Become a Pro
speakerdeck
PRO
29
5.5k
For a Future-Friendly Web
brad_frost
179
9.9k
YesSQL, Process and Tooling at Scale
rocio
173
14k
We Have a Design System, Now What?
morganepeng
53
7.7k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Mobile First: as difficult as doing things right
swwweet
223
9.9k
[RailsConf 2023] Rails as a piece of cake
palkan
56
5.7k
The Straight Up "How To Draw Better" Workshop
denniskardys
235
140k
Visualization
eitanlees
146
16k
Transcript
shopify Sharding Shopify Arthur Neves / @arthurnn
Shopifacts
Scale
~100 app servers
~100k customers
Stack Ruby 2.1.1, Rails 4.0, MySQL 5.6 (percona)
App servers (Unicorn) Job servers (Resque)
~150k RPM with peaks up to ~400k RPM
~20k QPS
Moneys
$1.6B Annual GMV Sales over all the platform
$3.7k / min
Outages
None
What is sharding?
Challenges of sharding Why you probably should not shard
Sharding Slicing your data over more than one database
Rails and libs assume one DB
Makes it hard to do aggregated queries Querying across DBs
is a pain
AUTO_INCREMENT Does not quite work the same anymore
Likely custom code Be prepared to maintain
You might be able to buy your way out Normally
just MOAR CPUs will do
Tune and cache Worked for us (tm) for a long
time
Why did we shard? If it does sound like a
pain
Traffic and customers double each year
Black Friday/ Cybermonday 2x normal load
You cannot cache writes
81 2011 2012 •2xIntel E5640 2.67GHz •192GB of RAM •2
x 300GB OCZ Z- Drive R4 PCIe SSDs
81 2011 2012 • 4 x Intel E5-4650 2.7 GHz
•256GB of RAM •2 x 600GB OCZ Z-Drive •2xIntel E5640 2.67GHz •192GB of RAM •2 x 300GB OCZ Z- Drive R4 PCIe SSDs
Buy better CPUs? 8x CPUs get quite expensive
Horizontal scaling n-CPUs
Side benefits
Scale writes/reads horizontally
Smaller indexes Faster lookups
Failure isolation If a shard fails, it’s still bad, but
not as bad as if the only DB fails
Sharding checklist Things that needed to happen
How to slice our data? Sharding key Rails code
Primary key generation Simple AUTO_INCREMENT will not cut it anymore
MySQL config
Connection switching Teach rails app to know which db to
talk to Rails code
What about JOINs and Transactions? Rails code
Rebalancing How do we move things across shards? Rails code
+ more
Sharding Shopify Or what did we do in 2013?
How did we slice our data? Sharding key
App servers (Unicorn) “The” Database
Most things are scoped to a shop Yay! easy shop_id
is the sharding key
None
Denormalize shop_id Makes life so much easier
Add shard_id to shop We did this before we had
any shards at all
Shard Master db Master
Primary keys
AUTO_INCR does not work anymore Ids will not be unique
across DBs
Noeqd Like snowflake, k-sortable id generator in golang
Works well Ran in production for only one table that
was safe- ish to screw up
One more service Do we really want that?
(ノಠ益ಠ)ノ⼺彡┻━┻
MySQL config
auto_increment_increment Controls the interval between successive column values.
auto_increment_offset Determines the starting point for the AUTO_INCREMENT column value.
next_id = auto_increment_offset + N × auto_increment_increment N is a
member of 1,2,3,4...
Fix on an increment Different offset per shard
increment,offset ids shard_0 4,1 5,9,13,17 shard_1 4,2 6,10,14,18 shard_2 4,3
7,11,15,19
Make sure ids are big 62 Shopify/rails-bigint-pk
Connection switching Teach your app to know what db to
talk to
Block based shard context Sharding.with_shard(id) { #stuff } Sharding.with_shop(shop) {
#stuff }
def do_stuff_in_shard(shard_id) Sharding.with_shard(shard_id) do # db stuff happens here!!! end
end
Within the block sharded models use the current shard connection
Sharded models include Sharding::Concern, have shop_id column.
class Order < ActiveRecord::Base include Sharding::Concern end
Non-sharded models do not care Always talk to the master
DB
Connection Swtching
HTTP requests
None
Domain to Shop
Shop to shard every shop has a shard_id.
Now run the request in a shard
around_action :with_shop ! def with_shop ! @shop = ShopManager.shop_for(request.host) !
Sharding.with_shop(@shop) yield # Actual request happens here. end end
Background jobs
if params[:shop_id]
easy
class CommentSpamCheckerJob include BackgroundQueue::Low include Sharding::BackgroundQueue::SelectShard include BackgroundQueue::Locking # perform
will have the shard context of # params[:shop_id] end
What about JOINs and Transactions? :(
Can’t just iterate A shop can be locked, extraneous rows
from moves
Take shared locks while iterating The user decides what to
do with locked shops, ignore/raise
They also ignore data left orphaned It will get deleted,
but might be hours
Primitives 84
query_on_each_shard Results from an ActiveRecord::Relation on each shard
things = [] rel = Model.preload(:checkouts, :order) .where(id: params[:id].split(+)) !
Sharding.query_on_each_shard(rel, :on_lock => :raise) do |recs| things.concat(recs) end
find_in_batches_on_each_shar d Like find_each except queries every shard
r = model.unscoped ! Sharding.find_in_batches_on_each_shard(r, on_lock: :ignore) do |recs| recs.each
do |record| reserialize_record(record, params[:fields]) progress.tick end end
If someone absolutely must run custom SQL Take the global
lock. Also, do not do this.
Rebalancing Do we move things across shards?
None
Lock-and-move
Global exclusive lock Maintenance tasks, some cron jobs and the
shop mover
shop.is_locked No need to reach to a locking server, just
HTTP 503 if set
Per shop shared lock Taken by the query across shard
methods as they iterate, important for things that go across shards
Zookeeper or Redis
$ script/move_shop --shop_id=42 \--dest_id=2 --concurrency=16
T0 T1 (+100secs) T2 T3 T4 T5 Take global lock
or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Take
global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Jobs
have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Jobs have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks shop.is_locked = false Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks shop.is_locked = false Take global lock or bail, shop.is_locked = true
What about the old data? Deleted offline
Why not online? Too hard
Sharding checklist • How to slice data • Primary key
generation • Connection switching • Query across shards • Rebalancing 107
Results Trial by fire Black Friday/Cyber Monday 2013.
~100k RPM higher than day-to-day
None
QPS 10-15k higher
None
All those spikes Flash sales 113
None
~80ms avg response time zero downtime, all time record sales
for one day
None
None
Thanks! Thanks @camilolopez for the slides