Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Learning to Build Distributed Systems the Hard Way
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Theo Hultberg
December 04, 2012
Programming
230
2
Share
Learning to Build Distributed Systems the Hard Way
Presentation held at JDays 4 December, 2012
Theo Hultberg
December 04, 2012
More Decks by Theo Hultberg
See All by Theo Hultberg
Datalakes at AWS Summit Stockholm 2018
iconara
0
100
Building a CQL driver
iconara
0
79
Chasing the Elephant
iconara
0
100
Learning to Build Distributed Systems the Hard Way
iconara
1
150
Learning to Build Distributed Systems the Hard Way
iconara
3
5.2k
Concurrency and Distributed Systems in JRuby
iconara
3
700
A Guide to the Post Relational Revolution
iconara
4
5.4k
Standing on the Shoulders of Giants with JRuby
iconara
4
170
Shortcuts Around the Mistakes I've Made Scaling MongoDB
iconara
4
170
Other Decks in Programming
See All in Programming
AI駆動開発勉強会 広島支部 第一回勉強会 AI駆動開発概要とワークショップ
hayatoshimiu
0
440
TypeScriptだけでAIエージェントを作る フロント・エージェント・インフラのフルスタック実践
har1101
6
1.3k
不変条件と整合性境界—ビジネスが決める設計判断と実現パターン / Invariants and Consistency Boundaries
nrslib
13
3.4k
Lemonade + Foundry Toolkit でお手軽アプリ開発
seosoft
1
270
生成AI時代にこそ効くGo | Why Go Works in the Age of Generative AI
mom0tomo
8
3.1k
Stage 3 Decorators でできること / できないこと / TSKaigi 2026
susisu
1
1.5k
軽量Java基盤の設計 DIコンテナに頼らない、長期保守と1秒起動の実現 JJUG CCC 2026 Spring
macha64
0
440
正しくソフトウェアを作る、前提を疑うための認知の視点 / doubt-premise
minodriven
17
5.7k
代数的データ型って何が嬉しいの? #frontend_phpcon_do
kajitack
8
3.1k
Java × distroless で 軽量なコンテナイメージを / Java on Distroless
contour_gara
0
480
肥大化するレガシーコードに立ち向かうためのインターフェース分離と依存の逆転 / JJUG CCC 2026 Spring
hirokunimaeta
0
480
Migrations : C'est une question d'hygiène !
vinceamstoutz
0
3.2k
Featured
See All Featured
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2.3k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.3k
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
200
What does AI have to do with Human Rights?
axbom
PRO
1
2.2k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
130
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
930
Large-scale JavaScript Application Architecture
addyosmani
515
110k
New Earth Scene 8
popppiees
3
2.3k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
52k
GraphQLとの向き合い方2022年版
quramy
50
15k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.3k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.4k
Transcript
LEARNING TO BUILD DISTRIBUTED SYSTEMS THE HARD WAY @iconara
LEARNING TO BUILD DISTRIBUTED SYSTEMS THE HARD WAY BIG DATA
@iconara
speakerdeck.com/u/iconara (real time!)
Theo / @iconara
chief architect at BURT
let’s make online advertising a great experience
None
MAKING THIS
INTO THIS
HOW HARD CAN IT BE?
None
30K REQUESTS PER SECOND more than a billion requests per
day, over 1 TB raw data
ONE VISIT CAN CHANGE UP TO 100K COUNTERS hundreds of
millions of individual counters per day, plus counting uniques and visitor histories
IN REAL TIME or near real time, if you want
to be pedantic ×
HOW HARD CAN IT BE?
START WITH TWO OF EVERYTHING going from one to two
is the hardest, solve the scaling problem up front
START WITH TWO OF EVERYTHING you’ll solve the scaling problem,
and need less overcapacity THREE
GIVE A LOT OF THOUGHT TO KEYS AND IDS and
think about your queries first
MEIHO0 JME57Z monotonically increasing, sorts nicely a timestamp something random
JME57Z MEIHO0 uniformly distributed, works nicely with sharding something random
a timestamp
CONSISTENCY IS OVERRATED don’t fear R + W < N
PRECOMPUTE ALL THE THINGS your users most likely don’t know
what they want, so why let them do ad hoc queries?
SEPARATE PROCESSING FROM STORAGE that way you can scale each
independently
PLAN HOW TO GET RID OF YOUR DATA deleting stuff
is harder than you might think × × × × × × ×
NoDB keep things streaming ×
DIVIDE THE LOAD big data systems are all about routing
and partitioning
RANDOM when you have no interdependencies between things it’s easy
to scale out
CONSISTENT when there are interdependencies you need to route using
some property of the objects, but make sure you get a uniform distribution
NUMEROLOGY
12
2 | 12 3 | 12 4 | 12 6
| 12
8 | 24 5 | 60
A DIVERSION ABOUT COUNTING TO 60 the reason why there’s
60 seconds to a minute, and 360 degrees to a circle × ×
3 SEGMENTS ON EACH FINGER = 12
3 SEGMENTS ON EACH FINGER = 12 FIVE FINGERS ON
OTHER HAND = 60
12, 60, 120, 360 superior highly composite numbers
12, 60, 120, 360 superior highly composite numbers
12, 60, 120, 360 superior highly composite numbers
12, 60, 120, 360 superior highly composite numbers
12, 60, 120, 360 superior highly composite numbers
12, 60, 120, 360 superior highly composite numbers
12, 60, 120, 360 superior highly composite numbers
12, 60, 120, 360 superior highly composite numbers
12, 60, 120, 360 superior highly composite numbers
12, 60, 120, 360 superior highly composite numbers
12, 60, 120, 360 superior highly composite numbers
12, 60, 120, 360 superior highly composite numbers
use multiples of 12 to scale without always having to
double
BLAH BLAH BLAH use multiples of 12 to scale without
always having to double
log2(366) ≈ 31
$-$ (ASCII code 36)-----
log2(366) ≈ 31
log2(366) ≈ 31 six characters 0-9, A-Z can represent 31
bits, which is kind of almost very close to four bytes
MEIHO0
MEIHO0 a timestamp Time.now.to_i.to_s(36).upcase
None
YOU CAN’T SCALE TO REAL TIME and don’t trust code
that doesn’t run continuously ×
DO YOU REALLY NEED A BACKUP? if you got 3x
replication over multiple availability zones, is that backup really worth it?
PRODUCTION IS THE ONLY REAL TEST ENVIRONMENT when thousands of
things happen every second, new, weird and unforeseen things happen all the time, your tests can only cover the foreseeable =
GÖTEBORG, DISTRIBUTED @gbgdistr
KTHXBAI @iconara github.com/iconara architecturalatrocities.com burtcorp.com