Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale large database
Search
duongkai
May 23, 2013
Programming
3
200
How to scale large database
Bài nói về các kĩ thuật để mở rộng một database lớn.
duongkai
May 23, 2013
Tweet
Share
More Decks by duongkai
See All by duongkai
Common crypto flaws in finance mobile apps
duongkai
0
84
Tetcon-2015 Using TLS correctly
duongkai
2
370
How to use SSL/TLS correctly
duongkai
1
180
5S - Xây dựng và thực hiện
duongkai
0
160
Why Random Matters
duongkai
0
77
Crypto-101 @hackerspace 26/07/2013
duongkai
1
110
Trao đổi email
duongkai
0
160
+TetCon.2013_Hacking.Oracle.2012.pdf
duongkai
0
150
Other Decks in Programming
See All in Programming
AtCoder Conference 2025「LLM時代のAHC」
imjk
2
670
クラウドに依存しないS3を使った開発術
simesaba80
0
230
Giselleで作るAI QAアシスタント 〜 Pull Requestレビューに継続的QAを
codenote
0
340
副作用をどこに置くか問題:オブジェクト指向で整理する設計判断ツリー
koxya
1
450
実はマルチモーダルだった。ブラウザの組み込みAI🧠でWebの未来を感じてみよう #jsfes #gemini
n0bisuke2
3
1.4k
[AtCoder Conference 2025] LLMを使った業務AHCの上⼿な解き⽅
terryu16
6
1k
MDN Web Docs に日本語翻訳でコントリビュート
ohmori_yusuke
0
500
まだ間に合う!Claude Code元年をふりかえる
nogu66
5
940
20251212 AI 時代的 Legacy Code 營救術 2025 WebConf
mouson
0
250
Python札幌 LT資料
t3tra
7
1.1k
Canon EOS R50 V と R5 Mark II 購入でみえてきた最近のデジイチ VR180 事情、そして VR180 静止画に活路を見出すまで
karad
0
140
2年のAppleウォレットパス開発の振り返り
muno92
PRO
0
180
Featured
See All Featured
Designing for Performance
lara
610
70k
How GitHub (no longer) Works
holman
316
140k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.8k
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
9.3k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
1
36
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.7k
Typedesign – Prime Four
hannesfritz
42
2.9k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
1
1.3k
How STYLIGHT went responsive
nonsquared
100
6k
A better future with KSS
kneath
240
18k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1k
Site-Speed That Sticks
csswizardry
13
1k
Transcript
How To Scale Large Database Phạm Tùng Dương – CIO03
Course: Advanced Database
Overview • First glance about Large Database • Typical techniques
to scale • Database sharding • Database sharding in MySQL
First glance about Large Database
When You Talk about Large Database
Example Tumblr @2012
Example • 400 million active users • 5 billion pieces
of content per week • 3 billion photos uploaded per month Facebook@2010
Example • 1 billion tweets per week • 140 million
tweets sent per day • 456 tweets per second @MJ death • 6939 tweets per second on NY day Twitter@2011
What is The Large Database • Large working data sets
• I/O write intensive
Typical approaches
What is The Bottleneck? I/O, I/O and I/O
We have a job which is called Performance Tuning
Scale up • Adding more RAM, more CPU • High
I/O HDD
Scale topo Replication (Master – Slave) Master Slave Client Read/Write
Read Only Master Master Storage Client Cluster (shared storage)
Caching • Memcached • Redis
Finally, Everything in RAM is a Dream!
But, No Silver Bullet!
Database Sharding
What is Database Sharding • Horizontal Partitioning • Data is
stored in small chunks and distributed across many computers • Often use with Replication
Database sharding topo Primary DB Shard1 Shard2
Shard3 Slave1 Slave2 Slave3
3 types • Range sharding • List sharding (Lookup table)
• Hash sharding
Range sharding • Distributed by the range of Primary Key
• Example – Primary Key: user_id (1..1000) user_shard1 (1..500) user_shard2 (501..1000)
List sharding • Distributed data by the attribute of the
data • Example: database of people in VN – Sharded by the city_name (Ha_Noi, Hai_Phong, Da_Nang,…)
Hash sharding (modulus) • Distributed data by using a hash
function on primary key. • Example: primary_key mod N
Pros of Database Sharding • Easy to scale (data, write
I/O) • Using commodity hardware • Minimum effect when system failed
Cons of Database sharding • You MUST implement by yourselves
• Operation is harder • Handle join operation is very difficult • Data denormalization – > Don’t do it because it’s COOL!
Database Sharding in MySQL
Sharding Solutions • Application layer • Storage layer • Heavy
middleware • Lightweight middleware
Application layer • Hibernate Shards • HiveDB
Storage layer • MySQL Spider – Requires to change storage engine
of MySQL
Heavy Middleware • Twitter Gizzard • dbShards – Each db
has an agent
Lightweight Middleware • Acts like a proxy • Route the
request • Spock, CUBRID
You Will Do It Because You Have To … not
because it’s Cool!
Q&A