Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sharding is hard.
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Rafał Malinowski
March 15, 2014
Programming
3
370
Sharding is hard.
Rafał Malinowski
March 15, 2014
Tweet
Share
Other Decks in Programming
See All in Programming
20260127_試行錯誤の結晶を1冊に。著者が解説 先輩データサイエンティストからの指南書 / author's_commentary_ds_instructions_guide
nash_efp
1
940
Spinner 軸ズレ現象を調べたらレンダリング深淵に飲まれた #レバテックMeetup
bengo4com
1
230
2026年 エンジニアリング自己学習法
yumechi
0
130
副作用をどこに置くか問題:オブジェクト指向で整理する設計判断ツリー
koxya
1
600
AgentCoreとHuman in the Loop
har1101
5
230
疑似コードによるプロンプト記述、どのくらい正確に実行される?
kokuyouwind
0
380
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
540
dchart: charts from deck markup
ajstarks
3
990
AIによる高速開発をどう制御するか? ガードレール設置で開発速度と品質を両立させたチームの事例
tonkotsuboy_com
7
2.2k
AIで開発はどれくらい加速したのか?AIエージェントによるコード生成を、現場の評価と研究開発の評価の両面からdeep diveしてみる
daisuketakeda
1
2.4k
Basic Architectures
denyspoltorak
0
670
CSC307 Lecture 01
javiergs
PRO
0
690
Featured
See All Featured
How to make the Groovebox
asonas
2
1.9k
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
220
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
320
The Power of CSS Pseudo Elements
geoffreycrofte
80
6.2k
From π to Pie charts
rasagy
0
120
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
180
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
110
Unsuck your backbone
ammeep
671
58k
Writing Fast Ruby
sferik
630
62k
Fireside Chat
paigeccino
41
3.8k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
770
Kristin Tynski - Automating Marketing Tasks With AI
techseoconnect
PRO
0
140
Transcript
SHARDING server #1 server # server #
Database is still main problem of web app
SCALABILITY !
ALTER TABLE ADD INDEX
JOIN is evil
denormalize users_links user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl
… … select count(*) from users_links where user_id=1; users user_id link 1
[email protected]
users_links user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl …
… select links_count from users where user_id=1; users user_id link links_count 1
[email protected]
100. 000 denormalize
memcached alternative PHP cache xcache redis
sphinx search lucene ! solr elastic search
Replication master slave slave SELECT UPDATE DELETE INSERT SELECT SELECT
Replication master master master SELECT UPDATE DELETE INSERT SELECT UPDATE
DELETE INSERT SELECT UPDATE DELETE INSERT
Replication is harder than you think!
know your tools
Rafał Malinowski @afterdesign
None
None
None
None
None
SHARDING
Sharding is a type of database partitioning that separates very
large databases into smaller parts called data shards The Internet
user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl
user_id link 1 phpers.pl user_id link 1 meetphp.pl user_id link
1 php3city.pl
None
How to shard ?
modulo
key % shards count = server id
sh.st/1
crc32('1') % 3 = 2
Sharding is hard. Resharding is harder !
crc32('1') % 3 = 2 crc32('1') % 4 = 3
db0 db1 4 5 6 7 0 1 2 3
8 9 db0 db2 db1
db0 db1 4 5 6 7 0 1 2 3
8 9 db0 db2 db1 db2 db3
shards moved rows 2 → 3 667 402 2 →
4 500 000 2 → 5 800 468 3 → 4 750 357 3 → 5 800 060 4 → 5 800 247
key map
shard_X url_id goto date
shards_counters shard_id count 1 0 2 0 shard_X url_id goto
date
shard_X url_id goto date shards_counters shard_id count 1 0 2
0 sharding_map url_id shard_id
meetphp.pl
shards_counters shard_id count 1 0 2 0 SELECT `shard_id` FROM
`shards_counters` ORDER BY `count` LIMIT 1;
shards_counters shard_id count 1 1 2 0 meetphp.pl sharding_map url_id
shard_id 1 1 sh.st/1
shards_counters shard_id count 1 1 2 0 shard_1.urls url_id goto
date 1 meetphp.pl 2014-01-01 sharding_map url_id shard_id 1 1
shards_counters shard_id count 1 1 2 0 SELECT `shard_id` FROM
`sharding_map` WHERE `url_id` = 1; sharding_map url_id shard_id 1 1
SELECT * FROM `urls` WHERE `url_id` = 1; shard_1.urls url_id
goto date 1 meetphp.pl 2014-01-01
phpers.pl
shards_counters shard_id count 1 1 2 0 SELECT `shard_id` FROM
`shards_counters` ORDER BY `count` LIMIT 1;
shards_counters shard_id count 1 1 2 1 sharding_map url_id shard_id
1 1 2 2 phpers.pl sh.st/2
shards_counters shard_id count 1 1 2 1 shard_2.urls url_id goto
date 2 phpers.pl 2014-01-01 sharding_map url_id shard_id 1 1 2 2
shards_counters shard_id count 1 1 2 1 SELECT `shard_id` FROM
`sharding_map` WHERE `url_id` = 2; sharding_map url_id shard_id 1 1 2 2
SELECT * FROM `shard_2.urls ` WHERE `url_id` = 2; shard_2.urls
url_id goto date 2 phpers.pl 2014-01-01
Key is unique throughout system
Adding shards ?
consistent hashing
0 max(int) shard_3 crc32('127.0.0.1') = 3619153832 shard_2 crc32('127.0.0.2') = 1321121298
shard_1 crc32('127.0.0.3') = 968459908
0 max(int) meetphp.pl crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') = 1321121298 crc32('127.0.0.3')
= 968459908
0 max(int) crc32('meetphp.pl') = 3254572804 crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') =
1321121298 crc32('127.0.0.3') = 968459908 meetphp.pl
sh.st/3254572804 meetphp.pl crc32('meetphp.pl') = 3254572804
shard_3.urls url_id goto date 3254572804 meetphp.pl 2014-01-01 crc32('127.0.0.1') = 3619153832
0 max(int) crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') = 1321121298 crc32('127.0.0.3') =
968459908 crc32('phpers.pl') = 42311941 phpers.pl
sh.st/42311941 phpers.pl crc32('phpers.pl') = 42311941
shard_1.urls url_id goto date 42311941 phpers.pl 2014-01-01 crc32('127.0.0.3') = 968459908
Searching is expensive
ketama
0 max(int) crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3')
buckets = {}
$bucketsCount = 1024; ! for ($i = 0; $i <
$bucketsCount; $i++) { buckets[i] = max(int) / $bucketsCount * i }
crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') 0 max(int) buckets[0] = 127.0.0.2
0 max(int) crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') buckets[261] = 127.0.0.1
crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') 0 max(int) buckets[772] = 127.0.0.1
meetphp.pl crc32('meetphp.pl') = 3254572804 3254572804 % 1024 = 772
buckets[772] = shard 3
phpers.pl crc32('phpers.pl') = 42311941 42311941 % 1024 = 261
buckets[261] = shard 3
shard_3.urls url_id goto date 1 meetphp.pl 2014-01-01 2 phpers.pl 2014-01-01
resharding ?
1 6
1 6 2
1 6 2 3
1 6 2 3
data distribution
oversharding
problems sharding implementation ? which tables shard ? how big
are our tables ?
problems data count pagers/lists
sharding count insert/select one of the last things to do
Rafał Malinowski @afterdesign