Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Abril Pro Ruby
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Arthur Nogueira Neves
April 26, 2014
Programming
370
0
Share
Abril Pro Ruby
Sharding presentation given at abrilproruby.com/en
Arthur Nogueira Neves
April 26, 2014
More Decks by Arthur Nogueira Neves
See All by Arthur Nogueira Neves
Using multiple connections in ActiveRecord
arthurnn
3
990
Rails talk - RubyLightningTalksTO
arthurnn
0
100
Toronto MongoDB User Group
arthurnn
1
86
Other Decks in Programming
See All in Programming
Migration to Signals, Signal Forms, Resource API, and NgRx Signal Store @Angular Days 03/2026 Munich
manfredsteyer
PRO
0
210
Coding at the Speed of Thought: The New Era of Symfony Docker
dunglas
0
4.2k
それはエンジニアリングの糧である:AI開発のためにAIのOSSを開発する現場より / It serves as fuel for engineering: insights from the field of developing open-source AI for AI development.
nrslib
1
820
ポーリング処理廃止によるイベント駆動アーキテクチャへの移行
seitarof
3
1.3k
[PHPerKaigi 2026]PHPerKaigi2025の企画CodeGolfが最高すぎて社内で内製して半年運営して得た内製と運営の知見
ikezoemakoto
0
320
L’IA au service des devs : Anatomie d'un assistant de Code Review
toham
0
190
Codex CLI でつくる、Issue から merge までの開発フロー
amata1219
0
280
我々はなぜ「層」を分けるのか〜「関心の分離」と「抽象化」で手に入れる変更に強いシンプルな設計〜 #phperkaigi / PHPerKaigi 2026
shogogg
2
750
テレメトリーシグナルが導くパフォーマンス最適化 / Performance Optimization Driven by Telemetry Signals
seike460
PRO
2
200
Strategy for Finding a Problem for OSS: With Real Examples
kibitan
0
130
脱 雰囲気実装!AgentCoreを良い感じにWEBアプリケーションに組み込むために
takuyay0ne
3
430
PHPのバージョンアップ時にも役立ったAST(2026年版)
matsuo_atsushi
0
280
Featured
See All Featured
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
200
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.7k
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
170
Everyday Curiosity
cassininazir
0
180
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
160
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
1
1.9k
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
61
43k
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
110
Tell your own story through comics
letsgokoyo
1
880
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
700
Transcript
shopify Sharding Shopify Arthur Neves / @arthurnn
Shopifacts
Scale
~100 app servers
~100k customers
Stack Ruby 2.1.1, Rails 4.0, MySQL 5.6 (percona)
App servers (Unicorn) Job servers (Resque)
~150k RPM with peaks up to ~400k RPM
~20k QPS
Moneys
$1.6B Annual GMV Sales over all the platform
$3.7k / min
Outages
None
What is sharding?
Challenges of sharding Why you probably should not shard
Sharding Slicing your data over more than one database
Rails and libs assume one DB
Makes it hard to do aggregated queries Querying across DBs
is a pain
AUTO_INCREMENT Does not quite work the same anymore
Likely custom code Be prepared to maintain
You might be able to buy your way out Normally
just MOAR CPUs will do
Tune and cache Worked for us (tm) for a long
time
Why did we shard? If it does sound like a
pain
Traffic and customers double each year
Black Friday/ Cybermonday 2x normal load
You cannot cache writes
81 2011 2012 •2xIntel E5640 2.67GHz •192GB of RAM •2
x 300GB OCZ Z- Drive R4 PCIe SSDs
81 2011 2012 • 4 x Intel E5-4650 2.7 GHz
•256GB of RAM •2 x 600GB OCZ Z-Drive •2xIntel E5640 2.67GHz •192GB of RAM •2 x 300GB OCZ Z- Drive R4 PCIe SSDs
Buy better CPUs? 8x CPUs get quite expensive
Horizontal scaling n-CPUs
Side benefits
Scale writes/reads horizontally
Smaller indexes Faster lookups
Failure isolation If a shard fails, it’s still bad, but
not as bad as if the only DB fails
Sharding checklist Things that needed to happen
How to slice our data? Sharding key Rails code
Primary key generation Simple AUTO_INCREMENT will not cut it anymore
MySQL config
Connection switching Teach rails app to know which db to
talk to Rails code
What about JOINs and Transactions? Rails code
Rebalancing How do we move things across shards? Rails code
+ more
Sharding Shopify Or what did we do in 2013?
How did we slice our data? Sharding key
App servers (Unicorn) “The” Database
Most things are scoped to a shop Yay! easy shop_id
is the sharding key
None
Denormalize shop_id Makes life so much easier
Add shard_id to shop We did this before we had
any shards at all
Shard Master db Master
Primary keys
AUTO_INCR does not work anymore Ids will not be unique
across DBs
Noeqd Like snowflake, k-sortable id generator in golang
Works well Ran in production for only one table that
was safe- ish to screw up
One more service Do we really want that?
(ノಠ益ಠ)ノ⼺彡┻━┻
MySQL config
auto_increment_increment Controls the interval between successive column values.
auto_increment_offset Determines the starting point for the AUTO_INCREMENT column value.
next_id = auto_increment_offset + N × auto_increment_increment N is a
member of 1,2,3,4...
Fix on an increment Different offset per shard
increment,offset ids shard_0 4,1 5,9,13,17 shard_1 4,2 6,10,14,18 shard_2 4,3
7,11,15,19
Make sure ids are big 62 Shopify/rails-bigint-pk
Connection switching Teach your app to know what db to
talk to
Block based shard context Sharding.with_shard(id) { #stuff } Sharding.with_shop(shop) {
#stuff }
def do_stuff_in_shard(shard_id) Sharding.with_shard(shard_id) do # db stuff happens here!!! end
end
Within the block sharded models use the current shard connection
Sharded models include Sharding::Concern, have shop_id column.
class Order < ActiveRecord::Base include Sharding::Concern end
Non-sharded models do not care Always talk to the master
DB
Connection Swtching
HTTP requests
None
Domain to Shop
Shop to shard every shop has a shard_id.
Now run the request in a shard
around_action :with_shop ! def with_shop ! @shop = ShopManager.shop_for(request.host) !
Sharding.with_shop(@shop) yield # Actual request happens here. end end
Background jobs
if params[:shop_id]
easy
class CommentSpamCheckerJob include BackgroundQueue::Low include Sharding::BackgroundQueue::SelectShard include BackgroundQueue::Locking # perform
will have the shard context of # params[:shop_id] end
What about JOINs and Transactions? :(
Can’t just iterate A shop can be locked, extraneous rows
from moves
Take shared locks while iterating The user decides what to
do with locked shops, ignore/raise
They also ignore data left orphaned It will get deleted,
but might be hours
Primitives 84
query_on_each_shard Results from an ActiveRecord::Relation on each shard
things = [] rel = Model.preload(:checkouts, :order) .where(id: params[:id].split(+)) !
Sharding.query_on_each_shard(rel, :on_lock => :raise) do |recs| things.concat(recs) end
find_in_batches_on_each_shar d Like find_each except queries every shard
r = model.unscoped ! Sharding.find_in_batches_on_each_shard(r, on_lock: :ignore) do |recs| recs.each
do |record| reserialize_record(record, params[:fields]) progress.tick end end
If someone absolutely must run custom SQL Take the global
lock. Also, do not do this.
Rebalancing Do we move things across shards?
None
Lock-and-move
Global exclusive lock Maintenance tasks, some cron jobs and the
shop mover
shop.is_locked No need to reach to a locking server, just
HTTP 503 if set
Per shop shared lock Taken by the query across shard
methods as they iterate, important for things that go across shards
Zookeeper or Redis
$ script/move_shop --shop_id=42 \--dest_id=2 --concurrency=16
T0 T1 (+100secs) T2 T3 T4 T5 Take global lock
or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Take
global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Jobs
have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Jobs have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks shop.is_locked = false Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks shop.is_locked = false Take global lock or bail, shop.is_locked = true
What about the old data? Deleted offline
Why not online? Too hard
Sharding checklist • How to slice data • Primary key
generation • Connection switching • Query across shards • Rebalancing 107
Results Trial by fire Black Friday/Cyber Monday 2013.
~100k RPM higher than day-to-day
None
QPS 10-15k higher
None
All those spikes Flash sales 113
None
~80ms avg response time zero downtime, all time record sales
for one day
None
None
Thanks! Thanks @camilolopez for the slides