Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
LXJS 2013: backpack — scalable photo storage
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Ivan Babrou
October 02, 2013
Programming
2
150
LXJS 2013: backpack — scalable photo storage
http://bobrik.name/talks/lxjs2013.pdf
— slides with notes.
http://youtu.be/T4DgxvS9Xho
— video.
Ivan Babrou
October 02, 2013
Tweet
Share
More Decks by Ivan Babrou
See All by Ivan Babrou
node.js for millions of images
bobrik
7
1.5k
Other Decks in Programming
See All in Programming
Codex の「自走力」を高める
yorifuji
0
1.2k
GC言語のWasm化とComponent Modelサポートの実践と課題 - Scalaの場合
tanishiking
0
120
AIとペアプロして処理時間を97%削減した話 #pyconshizu
kashewnuts
1
250
20260228_JAWS_Beginner_Kansai
takuyay0ne
5
560
Goの型安全性で実現する複数プロダクトの権限管理
ishikawa_pro
2
360
AI時代のシステム設計:ドメインモデルで変更しやすさを守る設計戦略
masuda220
PRO
5
1k
encoding/json/v2のUnmarshalはこう変わった:内部実装で見る設計改善
kurakura0916
0
420
エンジニアの「手元の自動化」を加速するn8n 2026.02.27
symy2co
0
160
What Spring Developers Should Know About Jakarta EE
ivargrimstad
0
370
ベクトル検索のフィルタを用いた機械学習モデルとの統合 / python-meetup-fukuoka-06-vector-attr
monochromegane
2
470
go directiveを最新にしすぎないで欲しい話──あるいは、Go 1.26からgo mod initで作られるgo directiveの値が変わる話 / Go 1.26 リリースパーティ
arthur1
2
560
ポーリング処理廃止によるイベント駆動アーキテクチャへの移行
seitarof
3
1.1k
Featured
See All Featured
Fireside Chat
paigeccino
42
3.8k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
790
Visualization
eitanlees
150
17k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
35k
Kristin Tynski - Automating Marketing Tasks With AI
techseoconnect
PRO
0
200
Optimizing for Happiness
mojombo
378
71k
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
260
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
67
37k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
How to Think Like a Performance Engineer
csswizardry
28
2.5k
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
240
Transcript
HI THERE, LXJS
% whoami Ian Babrou, Topface.com
60+ million users 100+ million photos
16 photos on main page up to 200 in feed
many small “previews”
This talk is about PHOTOS
Let’s look at some more numbers
12 storage nodes 70TB total space 44TB used
250 TB per month 1.6 Gbps peak 850 Mbps average
powered by node.js & nginx! open-source FTW
ARCHITECTURE aka part 1
frontend cache resizer storage
of course there are frontends, probably more than one frontend
frontend frontend frontend
round-robin dns + ipvs, probably frontend frontend frontend frontend
NGINX ngx_http_upstream_hash_module is your friend
NGINX ngx_http_upstream_hash_module because you need more than one cache, right?
#protip don’t cache anything twice
don’t do: frontend cache cache cache file#3 file#1 file#2 file#1
file#2 file#2 file#3 file#3 file#1
do: frontend cache cache cache file#3 file#1 file#2
NGINX + SSD is just great for caching, forget about
tmpfs
#protip overallocate caches
RESIZING resizing on the fly saves disk, but eats cpu
NGINX ngx_http_image_filter is your friend
BACKPACK aka part 2
first try nginx
okay for 1k files
okay for 10k files
okay for 50k files
okay until you fit in memory or have ssd
RANDOM ACCESS
DISKS ARE SPINNING
node.js to the rescue!
... and redis
... and zookeeper
simple idea: no extra fseek(3)
inspired by haystack from facebook
concatenate small files into bigger
always keep index in memory
REALIZATION ON DISK
3.5 gb files as many as you need
index for each name:offset:length name:offset:length name:offset:length
but.. no worries, this is only needed if redis goes
crazy
REALIZATION IN MEMORY
keys for files name -> file:offset:length name -> file:offset:length name
-> file:offset:length
redis 3.5gb data + index 3.5gb data + index 3.5gb
data + index memory disk all together:
POWERED BY node.js looks like webdav
PUT: 1. write data 2. write index 3. write redis
key
GET: 1. read redis key 2. read data data files
are always open!
let’s read 100K files! 0 37,5 75 112,5 150 backpack
nginx
LESS SEEKS LEAD TO BIGGER THROUGHPUT
BONUS! linearized access for processing
BUT WHAT ABOUT FUTURE?
NO MORE MEMORY vs DISK* Probably, someday.
MANAGEMENT aka part 3
1. adding servers 2. replication 3. failover
COORDINATOR
COORDINATOR combines servers into shards
COORDINATOR that’s where we need zookeeper
I KNOW let’s use DHT! like dynamo!
Rebalancing on capacity change
NO!
NO. THANK YOU!
LET’S MAKE IT SIMPLE
SHARDS (aka buckets) backpack #1 backpack #2 backpack #3 backpack
#4 backpack #5 backpack #6 shard #1 (50%) shard #2 (50%) 1:lol.jpg 2:wtf.jpg
ADDING SHARD backpack #1 backpack #3 backpack #3 backpack #4
backpack #5 backpack #6 1:lol.jpg 1:wtf.jpg shard #1 (50%) shard #2 (50%) backpack #7 backpack #8 backpack #9 50% chance shard #3 (0%)
COORDINATOR knows how to handle next file
NO REBALANCING SIMPLE
REPLICATOR
WHAT IF METEORITE WILL HIT YOUR NODE?
IT HAPPENS. YOU NEED TO ACCEPT THAT.
REPLICATOR to the rescue!
make multi-node SHARDS
DISTRIBUTE SHARDS ACROSS SERVERS
backpack #1 shard #1 lol.jpg backpack #1 lol.jpg backpack #1
lol.jpg server #1 server #1 server #1 coordinator replicator
REPLICATOR EVENTUALLY MAKES COPIES
THE WHOLE THING IS BULLET-PROOF IF YOU NEED IT
backpack #1 backpack #4 backpack #2 backpack #5 backpack #3
backpack #6 backpack #7 backpack #8 backpack #9 server #1 server #2 server #3 zookeeper #1 zookeeper #2 zookeeper #3 redis-queue #1 redis-queue #2 redis-queue #3 coordinator #1 coordinator #2 coordinator #3 replicator #1 replicator #2 replicator #3
GET THE CODE /Topface/backpack npm install backpack{,-coordinator,-replicator}
That’s it! bobrik ibobrik