Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sharding is hard.
Search
Rafał Malinowski
March 15, 2014
Programming
3
370
Sharding is hard.
Rafał Malinowski
March 15, 2014
Tweet
Share
Other Decks in Programming
See All in Programming
AIでLINEスタンプを作ってみた
eycjur
1
230
AI Agents: How Do They Work and How to Build Them @ Shift 2025
slobodan
0
100
@Environment(\.keyPath)那么好我不允许你们不知道! / atEnvironment keyPath is so good and you should know it!
lovee
0
130
実用的なGOCACHEPROG実装をするために / golang.tokyo #40
mazrean
1
300
Performance for Conversion! 分散トレーシングでボトルネックを 特定せよ
inetand
0
3.4k
GitHubとGitLabとAWS CodePipelineでCI/CDを組み比べてみた
satoshi256kbyte
4
250
Swift Updates - Learn Languages 2025
koher
2
510
さようなら Date。 ようこそTemporal! 3年間先行利用して得られた知見の共有
8beeeaaat
3
1.5k
2025 年のコーディングエージェントの現在地とエンジニアの仕事の変化について
azukiazusa1
24
12k
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
420
AWS発のAIエディタKiroを使ってみた
iriikeita
1
190
Flutter with Dart MCP: All You Need - 박제창 2025 I/O Extended Busan
itsmedreamwalker
0
150
Featured
See All Featured
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.1k
How to train your dragon (web standard)
notwaldorf
96
6.2k
Typedesign – Prime Four
hannesfritz
42
2.8k
Practical Orchestrator
shlominoach
190
11k
Build your cross-platform service in a week with App Engine
jlugia
231
18k
4 Signs Your Business is Dying
shpigford
184
22k
KATA
mclloyd
32
14k
[RailsConf 2023] Rails as a piece of cake
palkan
57
5.8k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
127
53k
Transcript
SHARDING server #1 server # server #
Database is still main problem of web app
SCALABILITY !
ALTER TABLE ADD INDEX
JOIN is evil
denormalize users_links user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl
… … select count(*) from users_links where user_id=1; users user_id link 1
[email protected]
users_links user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl …
… select links_count from users where user_id=1; users user_id link links_count 1
[email protected]
100. 000 denormalize
memcached alternative PHP cache xcache redis
sphinx search lucene ! solr elastic search
Replication master slave slave SELECT UPDATE DELETE INSERT SELECT SELECT
Replication master master master SELECT UPDATE DELETE INSERT SELECT UPDATE
DELETE INSERT SELECT UPDATE DELETE INSERT
Replication is harder than you think!
know your tools
Rafał Malinowski @afterdesign
None
None
None
None
None
SHARDING
Sharding is a type of database partitioning that separates very
large databases into smaller parts called data shards The Internet
user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl
user_id link 1 phpers.pl user_id link 1 meetphp.pl user_id link
1 php3city.pl
None
How to shard ?
modulo
key % shards count = server id
sh.st/1
crc32('1') % 3 = 2
Sharding is hard. Resharding is harder !
crc32('1') % 3 = 2 crc32('1') % 4 = 3
db0 db1 4 5 6 7 0 1 2 3
8 9 db0 db2 db1
db0 db1 4 5 6 7 0 1 2 3
8 9 db0 db2 db1 db2 db3
shards moved rows 2 → 3 667 402 2 →
4 500 000 2 → 5 800 468 3 → 4 750 357 3 → 5 800 060 4 → 5 800 247
key map
shard_X url_id goto date
shards_counters shard_id count 1 0 2 0 shard_X url_id goto
date
shard_X url_id goto date shards_counters shard_id count 1 0 2
0 sharding_map url_id shard_id
meetphp.pl
shards_counters shard_id count 1 0 2 0 SELECT `shard_id` FROM
`shards_counters` ORDER BY `count` LIMIT 1;
shards_counters shard_id count 1 1 2 0 meetphp.pl sharding_map url_id
shard_id 1 1 sh.st/1
shards_counters shard_id count 1 1 2 0 shard_1.urls url_id goto
date 1 meetphp.pl 2014-01-01 sharding_map url_id shard_id 1 1
shards_counters shard_id count 1 1 2 0 SELECT `shard_id` FROM
`sharding_map` WHERE `url_id` = 1; sharding_map url_id shard_id 1 1
SELECT * FROM `urls` WHERE `url_id` = 1; shard_1.urls url_id
goto date 1 meetphp.pl 2014-01-01
phpers.pl
shards_counters shard_id count 1 1 2 0 SELECT `shard_id` FROM
`shards_counters` ORDER BY `count` LIMIT 1;
shards_counters shard_id count 1 1 2 1 sharding_map url_id shard_id
1 1 2 2 phpers.pl sh.st/2
shards_counters shard_id count 1 1 2 1 shard_2.urls url_id goto
date 2 phpers.pl 2014-01-01 sharding_map url_id shard_id 1 1 2 2
shards_counters shard_id count 1 1 2 1 SELECT `shard_id` FROM
`sharding_map` WHERE `url_id` = 2; sharding_map url_id shard_id 1 1 2 2
SELECT * FROM `shard_2.urls ` WHERE `url_id` = 2; shard_2.urls
url_id goto date 2 phpers.pl 2014-01-01
Key is unique throughout system
Adding shards ?
consistent hashing
0 max(int) shard_3 crc32('127.0.0.1') = 3619153832 shard_2 crc32('127.0.0.2') = 1321121298
shard_1 crc32('127.0.0.3') = 968459908
0 max(int) meetphp.pl crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') = 1321121298 crc32('127.0.0.3')
= 968459908
0 max(int) crc32('meetphp.pl') = 3254572804 crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') =
1321121298 crc32('127.0.0.3') = 968459908 meetphp.pl
sh.st/3254572804 meetphp.pl crc32('meetphp.pl') = 3254572804
shard_3.urls url_id goto date 3254572804 meetphp.pl 2014-01-01 crc32('127.0.0.1') = 3619153832
0 max(int) crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') = 1321121298 crc32('127.0.0.3') =
968459908 crc32('phpers.pl') = 42311941 phpers.pl
sh.st/42311941 phpers.pl crc32('phpers.pl') = 42311941
shard_1.urls url_id goto date 42311941 phpers.pl 2014-01-01 crc32('127.0.0.3') = 968459908
Searching is expensive
ketama
0 max(int) crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3')
buckets = {}
$bucketsCount = 1024; ! for ($i = 0; $i <
$bucketsCount; $i++) { buckets[i] = max(int) / $bucketsCount * i }
crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') 0 max(int) buckets[0] = 127.0.0.2
0 max(int) crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') buckets[261] = 127.0.0.1
crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') 0 max(int) buckets[772] = 127.0.0.1
meetphp.pl crc32('meetphp.pl') = 3254572804 3254572804 % 1024 = 772
buckets[772] = shard 3
phpers.pl crc32('phpers.pl') = 42311941 42311941 % 1024 = 261
buckets[261] = shard 3
shard_3.urls url_id goto date 1 meetphp.pl 2014-01-01 2 phpers.pl 2014-01-01
resharding ?
1 6
1 6 2
1 6 2 3
1 6 2 3
data distribution
oversharding
problems sharding implementation ? which tables shard ? how big
are our tables ?
problems data count pagers/lists
sharding count insert/select one of the last things to do
Rafał Malinowski @afterdesign