$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sharding is hard.
Search
Rafał Malinowski
March 15, 2014
Programming
3
370
Sharding is hard.
Rafał Malinowski
March 15, 2014
Tweet
Share
Other Decks in Programming
See All in Programming
TestingOsaka6_Ozono
o3
0
170
Tinkerbellから学ぶ、Podで DHCPをリッスンする手法
tomokon
0
140
DevFest Android in Korea 2025 - 개발자 커뮤니티를 통해 얻는 가치
wisemuji
0
160
AIエンジニアリングのご紹介 / Introduction to AI Engineering
rkaga
8
3.1k
Deno Tunnel を使ってみた話
kamekyame
0
160
リリース時」テストから「デイリー実行」へ!開発マネージャが取り組んだ、レガシー自動テストのモダン化戦略
goataka
0
130
非同期処理の迷宮を抜ける: 初学者がつまづく構造的な原因
pd1xx
1
740
Rubyで鍛える仕組み化プロヂュース力
muryoimpl
0
150
S3 VectorsとStrands Agentsを利用したAgentic RAGシステムの構築
tosuri13
6
360
AIの誤りが許されない業務システムにおいて“信頼されるAI” を目指す / building-trusted-ai-systems
yuya4
6
3.8k
AIコーディングエージェント(NotebookLM)
kondai24
0
210
愛される翻訳の秘訣
kishikawakatsumi
3
330
Featured
See All Featured
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.1k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
The Curious Case for Waylosing
cassininazir
0
190
The browser strikes back
jonoalderson
0
64
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
170
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.1k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Claude Code のすすめ
schroneko
65
200k
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
0
98
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
120
Designing for Timeless Needs
cassininazir
0
86
Transcript
SHARDING server #1 server # server #
Database is still main problem of web app
SCALABILITY !
ALTER TABLE ADD INDEX
JOIN is evil
denormalize users_links user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl
… … select count(*) from users_links where user_id=1; users user_id link 1
[email protected]
users_links user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl …
… select links_count from users where user_id=1; users user_id link links_count 1
[email protected]
100. 000 denormalize
memcached alternative PHP cache xcache redis
sphinx search lucene ! solr elastic search
Replication master slave slave SELECT UPDATE DELETE INSERT SELECT SELECT
Replication master master master SELECT UPDATE DELETE INSERT SELECT UPDATE
DELETE INSERT SELECT UPDATE DELETE INSERT
Replication is harder than you think!
know your tools
Rafał Malinowski @afterdesign
None
None
None
None
None
SHARDING
Sharding is a type of database partitioning that separates very
large databases into smaller parts called data shards The Internet
user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl
user_id link 1 phpers.pl user_id link 1 meetphp.pl user_id link
1 php3city.pl
None
How to shard ?
modulo
key % shards count = server id
sh.st/1
crc32('1') % 3 = 2
Sharding is hard. Resharding is harder !
crc32('1') % 3 = 2 crc32('1') % 4 = 3
db0 db1 4 5 6 7 0 1 2 3
8 9 db0 db2 db1
db0 db1 4 5 6 7 0 1 2 3
8 9 db0 db2 db1 db2 db3
shards moved rows 2 → 3 667 402 2 →
4 500 000 2 → 5 800 468 3 → 4 750 357 3 → 5 800 060 4 → 5 800 247
key map
shard_X url_id goto date
shards_counters shard_id count 1 0 2 0 shard_X url_id goto
date
shard_X url_id goto date shards_counters shard_id count 1 0 2
0 sharding_map url_id shard_id
meetphp.pl
shards_counters shard_id count 1 0 2 0 SELECT `shard_id` FROM
`shards_counters` ORDER BY `count` LIMIT 1;
shards_counters shard_id count 1 1 2 0 meetphp.pl sharding_map url_id
shard_id 1 1 sh.st/1
shards_counters shard_id count 1 1 2 0 shard_1.urls url_id goto
date 1 meetphp.pl 2014-01-01 sharding_map url_id shard_id 1 1
shards_counters shard_id count 1 1 2 0 SELECT `shard_id` FROM
`sharding_map` WHERE `url_id` = 1; sharding_map url_id shard_id 1 1
SELECT * FROM `urls` WHERE `url_id` = 1; shard_1.urls url_id
goto date 1 meetphp.pl 2014-01-01
phpers.pl
shards_counters shard_id count 1 1 2 0 SELECT `shard_id` FROM
`shards_counters` ORDER BY `count` LIMIT 1;
shards_counters shard_id count 1 1 2 1 sharding_map url_id shard_id
1 1 2 2 phpers.pl sh.st/2
shards_counters shard_id count 1 1 2 1 shard_2.urls url_id goto
date 2 phpers.pl 2014-01-01 sharding_map url_id shard_id 1 1 2 2
shards_counters shard_id count 1 1 2 1 SELECT `shard_id` FROM
`sharding_map` WHERE `url_id` = 2; sharding_map url_id shard_id 1 1 2 2
SELECT * FROM `shard_2.urls ` WHERE `url_id` = 2; shard_2.urls
url_id goto date 2 phpers.pl 2014-01-01
Key is unique throughout system
Adding shards ?
consistent hashing
0 max(int) shard_3 crc32('127.0.0.1') = 3619153832 shard_2 crc32('127.0.0.2') = 1321121298
shard_1 crc32('127.0.0.3') = 968459908
0 max(int) meetphp.pl crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') = 1321121298 crc32('127.0.0.3')
= 968459908
0 max(int) crc32('meetphp.pl') = 3254572804 crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') =
1321121298 crc32('127.0.0.3') = 968459908 meetphp.pl
sh.st/3254572804 meetphp.pl crc32('meetphp.pl') = 3254572804
shard_3.urls url_id goto date 3254572804 meetphp.pl 2014-01-01 crc32('127.0.0.1') = 3619153832
0 max(int) crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') = 1321121298 crc32('127.0.0.3') =
968459908 crc32('phpers.pl') = 42311941 phpers.pl
sh.st/42311941 phpers.pl crc32('phpers.pl') = 42311941
shard_1.urls url_id goto date 42311941 phpers.pl 2014-01-01 crc32('127.0.0.3') = 968459908
Searching is expensive
ketama
0 max(int) crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3')
buckets = {}
$bucketsCount = 1024; ! for ($i = 0; $i <
$bucketsCount; $i++) { buckets[i] = max(int) / $bucketsCount * i }
crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') 0 max(int) buckets[0] = 127.0.0.2
0 max(int) crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') buckets[261] = 127.0.0.1
crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') 0 max(int) buckets[772] = 127.0.0.1
meetphp.pl crc32('meetphp.pl') = 3254572804 3254572804 % 1024 = 772
buckets[772] = shard 3
phpers.pl crc32('phpers.pl') = 42311941 42311941 % 1024 = 261
buckets[261] = shard 3
shard_3.urls url_id goto date 1 meetphp.pl 2014-01-01 2 phpers.pl 2014-01-01
resharding ?
1 6
1 6 2
1 6 2 3
1 6 2 3
data distribution
oversharding
problems sharding implementation ? which tables shard ? how big
are our tables ?
problems data count pagers/lists
sharding count insert/select one of the last things to do
Rafał Malinowski @afterdesign