Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Abril Pro Ruby
Search
Arthur Nogueira Neves
April 26, 2014
Programming
0
320
Abril Pro Ruby
Sharding presentation given at abrilproruby.com/en
Arthur Nogueira Neves
April 26, 2014
Tweet
Share
More Decks by Arthur Nogueira Neves
See All by Arthur Nogueira Neves
Using multiple connections in ActiveRecord
arthurnn
3
950
Rails talk - RubyLightningTalksTO
arthurnn
0
94
Toronto MongoDB User Group
arthurnn
1
84
Other Decks in Programming
See All in Programming
エンジニアに許された特別な時間の終わり
watany
83
71k
Swift Testingのモチベを上げたい
stoticdev
2
260
Go言語での実装を通して学ぶ、高速なベクトル検索を支えるクラスタリング技術/fukuokago-kmeans
monochromegane
1
110
15分で学ぶDuckDBの可愛い使い方 DuckDBの最近の更新
notrogue
3
920
Boost Your Web Performance with Hyperdrive
chimame
1
210
CIBMTR振り返り+敗北から学ぶコンペの取り組み方反省
takanao
1
390
PromptyによるAI開発入門
ymd65536
1
300
AWS Step Functions は CDK で書こう!
konokenj
5
980
気がついたら子供が社会人になって 自分と同じモバイルアプリエンジニアになった件 / Parent-Child Engineers
koishi
0
210
ローコードサービスの進化のためのモノレポ移行
taro28
1
200
運用しながらリアーキテクチャ
nealle
0
310
Goで作るChrome Extensions / Fukuoka.go #21
n3xem
2
2.1k
Featured
See All Featured
Code Reviewing Like a Champion
maltzj
521
39k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.5k
KATA
mclloyd
29
14k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
12k
Optimising Largest Contentful Paint
csswizardry
34
3.1k
GitHub's CSS Performance
jonrohan
1030
460k
Become a Pro
speakerdeck
PRO
26
5.2k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
Fashionably flexible responsive web design (full day workshop)
malarkey
406
66k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
How to Ace a Technical Interview
jacobian
276
23k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.7k
Transcript
shopify Sharding Shopify Arthur Neves / @arthurnn
Shopifacts
Scale
~100 app servers
~100k customers
Stack Ruby 2.1.1, Rails 4.0, MySQL 5.6 (percona)
App servers (Unicorn) Job servers (Resque)
~150k RPM with peaks up to ~400k RPM
~20k QPS
Moneys
$1.6B Annual GMV Sales over all the platform
$3.7k / min
Outages
None
What is sharding?
Challenges of sharding Why you probably should not shard
Sharding Slicing your data over more than one database
Rails and libs assume one DB
Makes it hard to do aggregated queries Querying across DBs
is a pain
AUTO_INCREMENT Does not quite work the same anymore
Likely custom code Be prepared to maintain
You might be able to buy your way out Normally
just MOAR CPUs will do
Tune and cache Worked for us (tm) for a long
time
Why did we shard? If it does sound like a
pain
Traffic and customers double each year
Black Friday/ Cybermonday 2x normal load
You cannot cache writes
81 2011 2012 •2xIntel E5640 2.67GHz •192GB of RAM •2
x 300GB OCZ Z- Drive R4 PCIe SSDs
81 2011 2012 • 4 x Intel E5-4650 2.7 GHz
•256GB of RAM •2 x 600GB OCZ Z-Drive •2xIntel E5640 2.67GHz •192GB of RAM •2 x 300GB OCZ Z- Drive R4 PCIe SSDs
Buy better CPUs? 8x CPUs get quite expensive
Horizontal scaling n-CPUs
Side benefits
Scale writes/reads horizontally
Smaller indexes Faster lookups
Failure isolation If a shard fails, it’s still bad, but
not as bad as if the only DB fails
Sharding checklist Things that needed to happen
How to slice our data? Sharding key Rails code
Primary key generation Simple AUTO_INCREMENT will not cut it anymore
MySQL config
Connection switching Teach rails app to know which db to
talk to Rails code
What about JOINs and Transactions? Rails code
Rebalancing How do we move things across shards? Rails code
+ more
Sharding Shopify Or what did we do in 2013?
How did we slice our data? Sharding key
App servers (Unicorn) “The” Database
Most things are scoped to a shop Yay! easy shop_id
is the sharding key
None
Denormalize shop_id Makes life so much easier
Add shard_id to shop We did this before we had
any shards at all
Shard Master db Master
Primary keys
AUTO_INCR does not work anymore Ids will not be unique
across DBs
Noeqd Like snowflake, k-sortable id generator in golang
Works well Ran in production for only one table that
was safe- ish to screw up
One more service Do we really want that?
(ノಠ益ಠ)ノ⼺彡┻━┻
MySQL config
auto_increment_increment Controls the interval between successive column values.
auto_increment_offset Determines the starting point for the AUTO_INCREMENT column value.
next_id = auto_increment_offset + N × auto_increment_increment N is a
member of 1,2,3,4...
Fix on an increment Different offset per shard
increment,offset ids shard_0 4,1 5,9,13,17 shard_1 4,2 6,10,14,18 shard_2 4,3
7,11,15,19
Make sure ids are big 62 Shopify/rails-bigint-pk
Connection switching Teach your app to know what db to
talk to
Block based shard context Sharding.with_shard(id) { #stuff } Sharding.with_shop(shop) {
#stuff }
def do_stuff_in_shard(shard_id) Sharding.with_shard(shard_id) do # db stuff happens here!!! end
end
Within the block sharded models use the current shard connection
Sharded models include Sharding::Concern, have shop_id column.
class Order < ActiveRecord::Base include Sharding::Concern end
Non-sharded models do not care Always talk to the master
DB
Connection Swtching
HTTP requests
None
Domain to Shop
Shop to shard every shop has a shard_id.
Now run the request in a shard
around_action :with_shop ! def with_shop ! @shop = ShopManager.shop_for(request.host) !
Sharding.with_shop(@shop) yield # Actual request happens here. end end
Background jobs
if params[:shop_id]
easy
class CommentSpamCheckerJob include BackgroundQueue::Low include Sharding::BackgroundQueue::SelectShard include BackgroundQueue::Locking # perform
will have the shard context of # params[:shop_id] end
What about JOINs and Transactions? :(
Can’t just iterate A shop can be locked, extraneous rows
from moves
Take shared locks while iterating The user decides what to
do with locked shops, ignore/raise
They also ignore data left orphaned It will get deleted,
but might be hours
Primitives 84
query_on_each_shard Results from an ActiveRecord::Relation on each shard
things = [] rel = Model.preload(:checkouts, :order) .where(id: params[:id].split(+)) !
Sharding.query_on_each_shard(rel, :on_lock => :raise) do |recs| things.concat(recs) end
find_in_batches_on_each_shar d Like find_each except queries every shard
r = model.unscoped ! Sharding.find_in_batches_on_each_shard(r, on_lock: :ignore) do |recs| recs.each
do |record| reserialize_record(record, params[:fields]) progress.tick end end
If someone absolutely must run custom SQL Take the global
lock. Also, do not do this.
Rebalancing Do we move things across shards?
None
Lock-and-move
Global exclusive lock Maintenance tasks, some cron jobs and the
shop mover
shop.is_locked No need to reach to a locking server, just
HTTP 503 if set
Per shop shared lock Taken by the query across shard
methods as they iterate, important for things that go across shards
Zookeeper or Redis
$ script/move_shop --shop_id=42 \--dest_id=2 --concurrency=16
T0 T1 (+100secs) T2 T3 T4 T5 Take global lock
or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Take
global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Jobs
have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Jobs have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks shop.is_locked = false Take global lock or bail, shop.is_locked = true
T0 T1 (+100secs) T2 T3 T4 T5 Unicorns dead Fork,
copy, verify Change shop.shard_id Jobs have released shared locks shop.is_locked = false Take global lock or bail, shop.is_locked = true
What about the old data? Deleted offline
Why not online? Too hard
Sharding checklist • How to slice data • Primary key
generation • Connection switching • Query across shards • Rebalancing 107
Results Trial by fire Black Friday/Cyber Monday 2013.
~100k RPM higher than day-to-day
None
QPS 10-15k higher
None
All those spikes Flash sales 113
None
~80ms avg response time zero downtime, all time record sales
for one day
None
None
Thanks! Thanks @camilolopez for the slides