Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sharding is hard.
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Rafał Malinowski
March 15, 2014
Programming
3
370
Sharding is hard.
Rafał Malinowski
March 15, 2014
Tweet
Share
Other Decks in Programming
See All in Programming
PHPで TLSのプロトコルを実装してみる
higaki_program
0
290
AI 開発合宿を通して得た学び
niftycorp
PRO
0
150
「やめとこ」がなくなった — 1月にZennを始めて22本書いた AI共創開発のリアル
atani14
0
410
生成 AI 時代のスナップショットテストってやつを見せてあげますよ(α版)
ojun9
0
270
AI Assistants for Your Angular Solutions
manfredsteyer
PRO
0
150
How to stabilize UI tests using XCTest
akkeylab
0
130
PHP 7.4でもOpenTelemetryゼロコード計装がしたい! / PHPerKaigi 2026
arthur1
1
270
GC言語のWasm化とComponent Modelサポートの実践と課題 - Scalaの場合
tanishiking
0
120
DevinとClaude Code、SREの現場で使い倒してみた件
karia
1
1.1k
クライアントワークでSREをするということ。あるいは事業会社におけるSREと同じこと・違うこと
nnaka2992
1
350
AI駆動開発の本音 〜Claude Code並列開発で見えたエンジニアの新しい役割〜
hisuzuya
4
520
米国のサイバーセキュリティタイムラインと見る Goの暗号パッケージの進化
tomtwinkle
2
620
Featured
See All Featured
sira's awesome portfolio website redesign presentation
elsirapls
0
190
Being A Developer After 40
akosma
91
590k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
360
30k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3.1k
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
220
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
250
How STYLIGHT went responsive
nonsquared
100
6k
Designing for Timeless Needs
cassininazir
0
170
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
230
The Illustrated Children's Guide to Kubernetes
chrisshort
51
52k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Transcript
SHARDING server #1 server # server #
Database is still main problem of web app
SCALABILITY !
ALTER TABLE ADD INDEX
JOIN is evil
denormalize users_links user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl
… … select count(*) from users_links where user_id=1; users user_id link 1
[email protected]
users_links user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl …
… select links_count from users where user_id=1; users user_id link links_count 1
[email protected]
100. 000 denormalize
memcached alternative PHP cache xcache redis
sphinx search lucene ! solr elastic search
Replication master slave slave SELECT UPDATE DELETE INSERT SELECT SELECT
Replication master master master SELECT UPDATE DELETE INSERT SELECT UPDATE
DELETE INSERT SELECT UPDATE DELETE INSERT
Replication is harder than you think!
know your tools
Rafał Malinowski @afterdesign
None
None
None
None
None
SHARDING
Sharding is a type of database partitioning that separates very
large databases into smaller parts called data shards The Internet
user_id link 1 phpers.pl 1 meetphp.pl 1 php3city.pl
user_id link 1 phpers.pl user_id link 1 meetphp.pl user_id link
1 php3city.pl
None
How to shard ?
modulo
key % shards count = server id
sh.st/1
crc32('1') % 3 = 2
Sharding is hard. Resharding is harder !
crc32('1') % 3 = 2 crc32('1') % 4 = 3
db0 db1 4 5 6 7 0 1 2 3
8 9 db0 db2 db1
db0 db1 4 5 6 7 0 1 2 3
8 9 db0 db2 db1 db2 db3
shards moved rows 2 → 3 667 402 2 →
4 500 000 2 → 5 800 468 3 → 4 750 357 3 → 5 800 060 4 → 5 800 247
key map
shard_X url_id goto date
shards_counters shard_id count 1 0 2 0 shard_X url_id goto
date
shard_X url_id goto date shards_counters shard_id count 1 0 2
0 sharding_map url_id shard_id
meetphp.pl
shards_counters shard_id count 1 0 2 0 SELECT `shard_id` FROM
`shards_counters` ORDER BY `count` LIMIT 1;
shards_counters shard_id count 1 1 2 0 meetphp.pl sharding_map url_id
shard_id 1 1 sh.st/1
shards_counters shard_id count 1 1 2 0 shard_1.urls url_id goto
date 1 meetphp.pl 2014-01-01 sharding_map url_id shard_id 1 1
shards_counters shard_id count 1 1 2 0 SELECT `shard_id` FROM
`sharding_map` WHERE `url_id` = 1; sharding_map url_id shard_id 1 1
SELECT * FROM `urls` WHERE `url_id` = 1; shard_1.urls url_id
goto date 1 meetphp.pl 2014-01-01
phpers.pl
shards_counters shard_id count 1 1 2 0 SELECT `shard_id` FROM
`shards_counters` ORDER BY `count` LIMIT 1;
shards_counters shard_id count 1 1 2 1 sharding_map url_id shard_id
1 1 2 2 phpers.pl sh.st/2
shards_counters shard_id count 1 1 2 1 shard_2.urls url_id goto
date 2 phpers.pl 2014-01-01 sharding_map url_id shard_id 1 1 2 2
shards_counters shard_id count 1 1 2 1 SELECT `shard_id` FROM
`sharding_map` WHERE `url_id` = 2; sharding_map url_id shard_id 1 1 2 2
SELECT * FROM `shard_2.urls ` WHERE `url_id` = 2; shard_2.urls
url_id goto date 2 phpers.pl 2014-01-01
Key is unique throughout system
Adding shards ?
consistent hashing
0 max(int) shard_3 crc32('127.0.0.1') = 3619153832 shard_2 crc32('127.0.0.2') = 1321121298
shard_1 crc32('127.0.0.3') = 968459908
0 max(int) meetphp.pl crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') = 1321121298 crc32('127.0.0.3')
= 968459908
0 max(int) crc32('meetphp.pl') = 3254572804 crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') =
1321121298 crc32('127.0.0.3') = 968459908 meetphp.pl
sh.st/3254572804 meetphp.pl crc32('meetphp.pl') = 3254572804
shard_3.urls url_id goto date 3254572804 meetphp.pl 2014-01-01 crc32('127.0.0.1') = 3619153832
0 max(int) crc32('127.0.0.1') = 3619153832 crc32('127.0.0.2') = 1321121298 crc32('127.0.0.3') =
968459908 crc32('phpers.pl') = 42311941 phpers.pl
sh.st/42311941 phpers.pl crc32('phpers.pl') = 42311941
shard_1.urls url_id goto date 42311941 phpers.pl 2014-01-01 crc32('127.0.0.3') = 968459908
Searching is expensive
ketama
0 max(int) crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3')
buckets = {}
$bucketsCount = 1024; ! for ($i = 0; $i <
$bucketsCount; $i++) { buckets[i] = max(int) / $bucketsCount * i }
crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') 0 max(int) buckets[0] = 127.0.0.2
0 max(int) crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') buckets[261] = 127.0.0.1
crc32('127.0.0.1') crc32('127.0.0.2') crc32('127.0.0.3') 0 max(int) buckets[772] = 127.0.0.1
meetphp.pl crc32('meetphp.pl') = 3254572804 3254572804 % 1024 = 772
buckets[772] = shard 3
phpers.pl crc32('phpers.pl') = 42311941 42311941 % 1024 = 261
buckets[261] = shard 3
shard_3.urls url_id goto date 1 meetphp.pl 2014-01-01 2 phpers.pl 2014-01-01
resharding ?
1 6
1 6 2
1 6 2 3
1 6 2 3
data distribution
oversharding
problems sharding implementation ? which tables shard ? how big
are our tables ?
problems data count pagers/lists
sharding count insert/select one of the last things to do
Rafał Malinowski @afterdesign