Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Abril Pro Ruby
Search
Arthur Nogueira Neves
April 26, 2014
Programming
0
290
Abril Pro Ruby
Sharding presentation given at abrilproruby.com/en
Arthur Nogueira Neves
April 26, 2014
Tweet
Share
More Decks by Arthur Nogueira Neves
See All by Arthur Nogueira Neves
Using multiple connections in ActiveRecord
arthurnn
3
920
Rails talk - RubyLightningTalksTO
arthurnn
0
90
Toronto MongoDB User Group
arthurnn
1
79
Other Decks in Programming
See All in Programming
Kotlin 2.0が与えるAndroid開発の進化
masayukisuda
1
230
ドメイン駆動設計を実践するために必要なもの
bikisuke
3
320
Amazon BedrockでサーバレスなAIお料理ボットを作成する!!
tosuri13
0
110
Playwrightから始めるVisual Regression Testingのススメ by とっと
totto2727
2
1.9k
どうしてこうなった?から理解するActive Recordの関連の裏側
willnet
5
550
Jakarta EE meets AI
ivargrimstad
1
140
Amazon Neptuneで始める初めてのグラフDB ー グラフDBを使う意味を考える ー
satoshi256kbyte
2
240
Swift Concurrencyとレースコンディション
objectiveaudio
1
400
Mastering AsyncSequence - 使う・作る・他のデザインパターン(クロージャ、Delegate など)から移行する
treastrain
4
1.5k
Appleの新しいプライバシー要件対応: ノーコードアプリ プラットフォームの実践事例
nao_randd
1
540
状態管理ライブラリZustandの導入から運用まで
k1tikurisu
3
400
私のEbitengineの第一歩
qt_luigi
0
440
Featured
See All Featured
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
34
1.7k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
41
6.5k
Become a Pro
speakerdeck
PRO
22
4.9k
BBQ
matthewcrist
83
9.1k
Designing Experiences People Love
moore
138
23k
How STYLIGHT went responsive
nonsquared
93
5.1k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
230
17k
Scaling GitHub
holman
458
140k
The Brand Is Dead. Long Live the Brand.
mthomps
53
37k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
25
1.3k
jQuery: Nuts, Bolts and Bling
dougneiner
61
7.4k
Learning to Love Humans: Emotional Interface Design
aarron
270
40k
Transcript
shopify Sharding Shopify Arthur Neves / @arthurnn
Shopifacts
Scale
~100 app servers
~100k customers
Stack Ruby 2.1.1, Rails 4.0, MySQL 5.6 (percona)
App servers (Unicorn) Job servers (Resque)
~150k RPM with peaks up to ~400k RPM
~20k QPS
Moneys
$1.6B Annual GMV Sales over all the platform
$3.7k / min
Outages
None
What is sharding?
Challenges of sharding Why you probably should not shard
Sharding Slicing your data over more than one database
Rails and libs assume one DB
Makes it hard to do aggregated queries Querying across DBs
is a pain
AUTO_INCREMENT Does not quite work the same anymore
Likely custom code Be prepared to maintain
You might be able to buy your way out Normally
just MOAR CPUs will do
Tune and cache Worked for us (tm) for a long
time
Why did we shard? If it does sound like a
pain
Traffic and customers double each year
Black Friday/ Cybermonday 2x normal load
You cannot cache writes
81 2011 2012 •2xIntel E5640 2.67GHz •192GB of RAM •2
x 300GB OCZ Z- Drive R4 PCIe SSDs
81 2011 2012 • 4 x Intel E5-4650 2.7 GHz
•256GB of RAM •2 x 600GB OCZ Z-Drive •2xIntel E5640 2.67GHz •192GB of RAM •2 x 300GB OCZ Z- Drive R4 PCIe SSDs
Buy better CPUs? 8x CPUs get quite expensive
Horizontal scaling n-CPUs
Side benefits
Scale writes/reads horizontally
Smaller indexes Faster lookups
Failure isolation If a shard fails, it’s still bad, but
not as bad as if the only DB fails
Sharding checklist Things that needed to happen
How to slice our data? Sharding key Rails code
Primary key generation Simple AUTO_INCREMENT will not cut it anymore
MySQL config
Connection switching Teach rails app to know which db to
talk to Rails code
What about JOINs and Transactions? Rails code
Rebalancing How do we move things across shards? Rails code
+ more
Sharding Shopify Or what did we do in 2013?
How did we slice our data? Sharding key
App servers (Unicorn) “The” Database
Most things are scoped to a shop Yay! easy shop_id
is the sharding key
None
Denormalize shop_id Makes life so much easier
Add shard_id to shop We did this before we had
any shards at all
Shard Master db Master
Primary keys
AUTO_INCR does not work anymore Ids will not be unique
across DBs
Noeqd Like snowflake, k-sortable id generator in golang
Works well Ran in production for only one table that
was safe- ish to screw up
One more service Do we really want that?
(ノಠ益ಠ)ノ⼺彡┻━┻
MySQL config
auto_increment_increment Controls the interval between successive column values.
auto_increment_offset Determines the starting point for the AUTO_INCREMENT column value.
next_id = auto_increment_offset + N × auto_increment_increment N is a
member of 1,2,3,4...
Fix on an increment Different offset per shard
increment,offset ids shard_0 4,1 5,9,13,17 shard_1 4,2 6,10,14,18 shard_2 4,3
7,11,15,19
Make sure ids are big 62 Shopify/rails-bigint-pk
Connection switching Teach your app to know what db to
talk to
Block based shard context Sharding.with_shard(id) { #stuff } Sharding.with_shop(shop) {
#stuff }
def do_stuff_in_shard(shard_id) Sharding.with_shard(shard_id) do # db stuff happens here!!! end
end
Within the block sharded models use the current shard connection
Sharded models include Sharding::Concern, have shop_id column.
class Order < ActiveRecord::Base include Sharding::Concern end
Non-sharded models do not care Always talk to the master
DB
Connection Swtching
HTTP requests
None
Domain to Shop
Shop to shard every shop has a shard_id.
Now run the request in a shard
around_action :with_shop ! def with_shop ! @shop = ShopManager.shop_for(request.host) !
Sharding.with_shop(@shop) yield # Actual request happens here. end end
Background jobs
if params[:shop_id]
easy
class CommentSpamCheckerJob include BackgroundQueue::Low include Sharding::BackgroundQueue::SelectShard include BackgroundQueue::Locking # perform
will have the shard context of # params[:shop_id] end
What about JOINs and Transactions? :(
Can’t just iterate A shop can be locked, extraneous rows
from moves
Take shared locks while iterating The user decides what to
do with locked shops, ignore/raise
They also ignore data left orphaned It will get deleted,
but might be hours
Primitives 84
query_on_each_shard Results from an ActiveRecord::Relation on each shard
things = [] rel = Model.preload(:checkouts, :order) .where(id: params[:id].split(+)) !
Sharding.query_on_each_shard(rel, :on_lock => :raise) do |recs| things.concat(recs) end
find_in_batches_on_each_shar d Like find_each except queries every shard
r = model.unscoped ! Sharding.find_in_batches_on_each_shard(r, on_lock: :ignore) do |recs| recs.each
do |record| reserialize_record(record, params[:fields]) progress.tick end end
If someone absolutely must run custom SQL Take the global
lock. Also, do not do this.
Rebalancing Do we move things across shards?
None
Lock-and-move
Global exclusive lock Maintenance tasks, some cron jobs and the
shop mover
shop.is_locked No need to reach to a locking server, just
HTTP 503 if set
Per shop shared lock Taken by the query across shard
methods as they iterate, important for things that go across shards
Zookeeper or Redis
$ script/move_shop --shop_id=42 \--dest_id=2 --concurrency=16
T0 T1 (+100secs) T2 T3 T4 T5 Take global lock
or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Take
global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Jobs
have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Jobs have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks shop.is_locked = false Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks shop.is_locked = false Take global lock or bail, shop.is_locked = true
What about the old data? Deleted offline
Why not online? Too hard
Sharding checklist • How to slice data • Primary key
generation • Connection switching • Query across shards • Rebalancing 107
Results Trial by fire Black Friday/Cyber Monday 2013.
~100k RPM higher than day-to-day
None
QPS 10-15k higher
None
All those spikes Flash sales 113
None
~80ms avg response time zero downtime, all time record sales
for one day
None
None
Thanks! Thanks @camilolopez for the slides