Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Abril Pro Ruby
Search
Arthur Nogueira Neves
April 26, 2014
Programming
380
0
Share
Abril Pro Ruby
Sharding presentation given at abrilproruby.com/en
Arthur Nogueira Neves
April 26, 2014
More Decks by Arthur Nogueira Neves
See All by Arthur Nogueira Neves
Using multiple connections in ActiveRecord
arthurnn
3
1k
Rails talk - RubyLightningTalksTO
arthurnn
0
110
Toronto MongoDB User Group
arthurnn
1
87
Other Decks in Programming
See All in Programming
正しくソフトウェアを作る、前提を疑うための認知の視点 / doubt-premise
minodriven
15
4.8k
運用エージェントは "作る" から "育てる" へ - 記憶と自己進化の3層設計パターン / self-evolving-agents-three-layer-agent-design
gawa
12
3.4k
ReactとSvelteのその先、Ripple-TS / Beyond React and Svelte: Ripple-TS
ssssota
3
1.9k
TypeSpec で繋ぐ複数プロダクトの型安全
maroon8021
1
310
Technical Debt: Understanding it Rightly, Engaging it Rightly #LaravelLiveJP
shogogg
0
190
密結合なバックエンドから TypeScript のコードを生成する
kemuridama
1
640
OCRを使ってゲームのアイテムをデータ化する
kishikawakatsumi
0
130
AIとRubyの静的型付け
ukin0k0
0
510
Lemonade + Foundry Toolkit でお手軽アプリ開発
seosoft
1
260
「AIで開発し、AIを届ける」をEvalでつなぐ 〜AIネイティブに始めるプロダクト開発の実践〜 / Connecting "Develop with AI, deliver AI" with Eval
rkaga
2
840
プラグインで拡張される Context をtype-safe にする難しさと設計判断
kazupon
2
540
軽量Java基盤の設計 DIコンテナに頼らない、長期保守と1秒起動の実現 JJUG CCC 2026 Spring
macha64
0
380
Featured
See All Featured
Amusing Abliteration
ianozsvald
1
190
My Coaching Mixtape
mlcsv
0
140
Principles of Awesome APIs and How to Build Them.
keavy
128
17k
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
350
Information Architects: The Missing Link in Design Systems
soysaucechin
0
950
Leading Effective Engineering Teams in the AI Era
addyosmani
9
2k
Evolving SEO for Evolving Search Engines
ryanjones
0
210
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
1
230
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
8.1k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
250
1.3M
Optimising Largest Contentful Paint
csswizardry
37
3.7k
Transcript
shopify Sharding Shopify Arthur Neves / @arthurnn
Shopifacts
Scale
~100 app servers
~100k customers
Stack Ruby 2.1.1, Rails 4.0, MySQL 5.6 (percona)
App servers (Unicorn) Job servers (Resque)
~150k RPM with peaks up to ~400k RPM
~20k QPS
Moneys
$1.6B Annual GMV Sales over all the platform
$3.7k / min
Outages
None
What is sharding?
Challenges of sharding Why you probably should not shard
Sharding Slicing your data over more than one database
Rails and libs assume one DB
Makes it hard to do aggregated queries Querying across DBs
is a pain
AUTO_INCREMENT Does not quite work the same anymore
Likely custom code Be prepared to maintain
You might be able to buy your way out Normally
just MOAR CPUs will do
Tune and cache Worked for us (tm) for a long
time
Why did we shard? If it does sound like a
pain
Traffic and customers double each year
Black Friday/ Cybermonday 2x normal load
You cannot cache writes
81 2011 2012 •2xIntel E5640 2.67GHz •192GB of RAM •2
x 300GB OCZ Z- Drive R4 PCIe SSDs
81 2011 2012 • 4 x Intel E5-4650 2.7 GHz
•256GB of RAM •2 x 600GB OCZ Z-Drive •2xIntel E5640 2.67GHz •192GB of RAM •2 x 300GB OCZ Z- Drive R4 PCIe SSDs
Buy better CPUs? 8x CPUs get quite expensive
Horizontal scaling n-CPUs
Side benefits
Scale writes/reads horizontally
Smaller indexes Faster lookups
Failure isolation If a shard fails, it’s still bad, but
not as bad as if the only DB fails
Sharding checklist Things that needed to happen
How to slice our data? Sharding key Rails code
Primary key generation Simple AUTO_INCREMENT will not cut it anymore
MySQL config
Connection switching Teach rails app to know which db to
talk to Rails code
What about JOINs and Transactions? Rails code
Rebalancing How do we move things across shards? Rails code
+ more
Sharding Shopify Or what did we do in 2013?
How did we slice our data? Sharding key
App servers (Unicorn) “The” Database
Most things are scoped to a shop Yay! easy shop_id
is the sharding key
None
Denormalize shop_id Makes life so much easier
Add shard_id to shop We did this before we had
any shards at all
Shard Master db Master
Primary keys
AUTO_INCR does not work anymore Ids will not be unique
across DBs
Noeqd Like snowflake, k-sortable id generator in golang
Works well Ran in production for only one table that
was safe- ish to screw up
One more service Do we really want that?
(ノಠ益ಠ)ノ⼺彡┻━┻
MySQL config
auto_increment_increment Controls the interval between successive column values.
auto_increment_offset Determines the starting point for the AUTO_INCREMENT column value.
next_id = auto_increment_offset + N × auto_increment_increment N is a
member of 1,2,3,4...
Fix on an increment Different offset per shard
increment,offset ids shard_0 4,1 5,9,13,17 shard_1 4,2 6,10,14,18 shard_2 4,3
7,11,15,19
Make sure ids are big 62 Shopify/rails-bigint-pk
Connection switching Teach your app to know what db to
talk to
Block based shard context Sharding.with_shard(id) { #stuff } Sharding.with_shop(shop) {
#stuff }
def do_stuff_in_shard(shard_id) Sharding.with_shard(shard_id) do # db stuff happens here!!! end
end
Within the block sharded models use the current shard connection
Sharded models include Sharding::Concern, have shop_id column.
class Order < ActiveRecord::Base include Sharding::Concern end
Non-sharded models do not care Always talk to the master
DB
Connection Swtching
HTTP requests
None
Domain to Shop
Shop to shard every shop has a shard_id.
Now run the request in a shard
around_action :with_shop ! def with_shop ! @shop = ShopManager.shop_for(request.host) !
Sharding.with_shop(@shop) yield # Actual request happens here. end end
Background jobs
if params[:shop_id]
easy
class CommentSpamCheckerJob include BackgroundQueue::Low include Sharding::BackgroundQueue::SelectShard include BackgroundQueue::Locking # perform
will have the shard context of # params[:shop_id] end
What about JOINs and Transactions? :(
Can’t just iterate A shop can be locked, extraneous rows
from moves
Take shared locks while iterating The user decides what to
do with locked shops, ignore/raise
They also ignore data left orphaned It will get deleted,
but might be hours
Primitives 84
query_on_each_shard Results from an ActiveRecord::Relation on each shard
things = [] rel = Model.preload(:checkouts, :order) .where(id: params[:id].split(+)) !
Sharding.query_on_each_shard(rel, :on_lock => :raise) do |recs| things.concat(recs) end
find_in_batches_on_each_shar d Like find_each except queries every shard
r = model.unscoped ! Sharding.find_in_batches_on_each_shard(r, on_lock: :ignore) do |recs| recs.each
do |record| reserialize_record(record, params[:fields]) progress.tick end end
If someone absolutely must run custom SQL Take the global
lock. Also, do not do this.
Rebalancing Do we move things across shards?
None
Lock-and-move
Global exclusive lock Maintenance tasks, some cron jobs and the
shop mover
shop.is_locked No need to reach to a locking server, just
HTTP 503 if set
Per shop shared lock Taken by the query across shard
methods as they iterate, important for things that go across shards
Zookeeper or Redis
$ script/move_shop --shop_id=42 \--dest_id=2 --concurrency=16
T0 T1 (+100secs) T2 T3 T4 T5 Take global lock
or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Take
global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Jobs
have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Jobs have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks shop.is_locked = false Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks shop.is_locked = false Take global lock or bail, shop.is_locked = true
What about the old data? Deleted offline
Why not online? Too hard
Sharding checklist • How to slice data • Primary key
generation • Connection switching • Query across shards • Rebalancing 107
Results Trial by fire Black Friday/Cyber Monday 2013.
~100k RPM higher than day-to-day
None
QPS 10-15k higher
None
All those spikes Flash sales 113
None
~80ms avg response time zero downtime, all time record sales
for one day
None
None
Thanks! Thanks @camilolopez for the slides