Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sharding is hard.
Search
Rafał Malinowski
March 15, 2014
Programming
3
370
Sharding is hard.
Rafał Malinowski
March 15, 2014
Tweet
Share
Other Decks in Programming
See All in Programming
Node-REDのノードの開発・活用事例とコミュニティとの関わり(Node-RED Con Nagoya 2025)
404background
0
110
Migration to Signals, Resource API, and NgRx Signal Store
manfredsteyer
PRO
0
130
Introduce Hono CLI
yusukebe
6
3.2k
Designing Repeatable Edits: The Architecture of . in Vim
satorunooshie
0
180
ドメイン駆動設計のエッセンス
masuda220
PRO
15
6.7k
テーブル定義書の構造化抽出して、生成AIでDWH分析を試してみた / devio2025tokyo
kasacchiful
0
340
Researchlyの開発で参考にしたデザイン
adsholoko
0
100
CSC305 Lecture 11
javiergs
PRO
0
320
Vueのバリデーション、結局どれを選べばいい? ― 自作バリデーションの限界と、脱却までの道のり ― / Which Vue Validation Library Should We Really Use? The Limits of Self-Made Validation and How I Finally Moved On
neginasu
3
1.7k
バッチ処理を「状態の記録」から「事実の記録」へ
panda728
PRO
0
210
O Que É e Como Funciona o PHP-FPM?
marcelgsantos
0
240
CSC305 Lecture 10
javiergs
PRO
0
320
Featured
See All Featured
Build your cross-platform service in a week with App Engine
jlugia
234
18k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
36
6.1k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.7k
Automating Front-end Workflow
addyosmani
1371
200k
Six Lessons from altMBA
skipperchong
29
4k
The Cult of Friendly URLs
andyhume
79
6.6k
Building Applications with DynamoDB
mza
96
6.7k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
Large-scale JavaScript Application Architecture
addyosmani
514
110k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
630
Building a Modern Day E-commerce SEO Strategy
aleyda
44
7.9k
Transcript
SHARDING server #1 server # server #
Database is still main problem of web app
SCALABILITY !
ALTER TABLE ADD INDEX
JOIN is evil
denormalize users_links user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl
… … select count(*) from users_links where user_id=1; users user_id link 1
[email protected]
users_links user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl …
… select links_count from users where user_id=1; users user_id link links_count 1
[email protected]
100. 000 denormalize
memcached alternative PHP cache xcache redis
sphinx search lucene ! solr elastic search
Replication master slave slave SELECT UPDATE DELETE INSERT SELECT SELECT
Replication master master master SELECT UPDATE DELETE INSERT SELECT UPDATE
DELETE INSERT SELECT UPDATE DELETE INSERT
Replication is harder than you think!
know your tools
Rafał Malinowski @afterdesign
None
None
None
None
None
SHARDING
Sharding is a type of database partitioning that separates very
large databases into smaller parts called data shards The Internet
user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl
user_id link 1 phpers.pl user_id link 1 meetphp.pl user_id link
1 php3city.pl
None
How to shard ?
modulo
key % shards count = server id
sh.st/1
crc32('1') % 3 = 2
Sharding is hard. Resharding is harder !
crc32('1') % 3 = 2 crc32('1') % 4 = 3
db0 db1 4 5 6 7 0 1 2 3
8 9 db0 db2 db1
db0 db1 4 5 6 7 0 1 2 3
8 9 db0 db2 db1 db2 db3
shards moved rows 2 → 3 667 402 2 →
4 500 000 2 → 5 800 468 3 → 4 750 357 3 → 5 800 060 4 → 5 800 247
key map
shard_X url_id goto date
shards_counters shard_id count 1 0 2 0 shard_X url_id goto
date
shard_X url_id goto date shards_counters shard_id count 1 0 2
0 sharding_map url_id shard_id
meetphp.pl
shards_counters shard_id count 1 0 2 0 SELECT `shard_id` FROM
`shards_counters` ORDER BY `count` LIMIT 1;
shards_counters shard_id count 1 1 2 0 meetphp.pl sharding_map url_id
shard_id 1 1 sh.st/1
shards_counters shard_id count 1 1 2 0 shard_1.urls url_id goto
date 1 meetphp.pl 2014-01-01 sharding_map url_id shard_id 1 1
shards_counters shard_id count 1 1 2 0 SELECT `shard_id` FROM
`sharding_map` WHERE `url_id` = 1; sharding_map url_id shard_id 1 1
SELECT * FROM `urls` WHERE `url_id` = 1; shard_1.urls url_id
goto date 1 meetphp.pl 2014-01-01
phpers.pl
shards_counters shard_id count 1 1 2 0 SELECT `shard_id` FROM
`shards_counters` ORDER BY `count` LIMIT 1;
shards_counters shard_id count 1 1 2 1 sharding_map url_id shard_id
1 1 2 2 phpers.pl sh.st/2
shards_counters shard_id count 1 1 2 1 shard_2.urls url_id goto
date 2 phpers.pl 2014-01-01 sharding_map url_id shard_id 1 1 2 2
shards_counters shard_id count 1 1 2 1 SELECT `shard_id` FROM
`sharding_map` WHERE `url_id` = 2; sharding_map url_id shard_id 1 1 2 2
SELECT * FROM `shard_2.urls ` WHERE `url_id` = 2; shard_2.urls
url_id goto date 2 phpers.pl 2014-01-01
Key is unique throughout system
Adding shards ?
consistent hashing
0 max(int) shard_3 crc32('127.0.0.1') = 3619153832 shard_2 crc32('127.0.0.2') = 1321121298
shard_1 crc32('127.0.0.3') = 968459908
0 max(int) meetphp.pl crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') = 1321121298 crc32('127.0.0.3')
= 968459908
0 max(int) crc32('meetphp.pl') = 3254572804 crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') =
1321121298 crc32('127.0.0.3') = 968459908 meetphp.pl
sh.st/3254572804 meetphp.pl crc32('meetphp.pl') = 3254572804
shard_3.urls url_id goto date 3254572804 meetphp.pl 2014-01-01 crc32('127.0.0.1') = 3619153832
0 max(int) crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') = 1321121298 crc32('127.0.0.3') =
968459908 crc32('phpers.pl') = 42311941 phpers.pl
sh.st/42311941 phpers.pl crc32('phpers.pl') = 42311941
shard_1.urls url_id goto date 42311941 phpers.pl 2014-01-01 crc32('127.0.0.3') = 968459908
Searching is expensive
ketama
0 max(int) crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3')
buckets = {}
$bucketsCount = 1024; ! for ($i = 0; $i <
$bucketsCount; $i++) { buckets[i] = max(int) / $bucketsCount * i }
crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') 0 max(int) buckets[0] = 127.0.0.2
0 max(int) crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') buckets[261] = 127.0.0.1
crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') 0 max(int) buckets[772] = 127.0.0.1
meetphp.pl crc32('meetphp.pl') = 3254572804 3254572804 % 1024 = 772
buckets[772] = shard 3
phpers.pl crc32('phpers.pl') = 42311941 42311941 % 1024 = 261
buckets[261] = shard 3
shard_3.urls url_id goto date 1 meetphp.pl 2014-01-01 2 phpers.pl 2014-01-01
resharding ?
1 6
1 6 2
1 6 2 3
1 6 2 3
data distribution
oversharding
problems sharding implementation ? which tables shard ? how big
are our tables ?
problems data count pagers/lists
sharding count insert/select one of the last things to do
Rafał Malinowski @afterdesign