Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Large scale distributed systems patterns
Search
Ryosuke Iwanaga
September 22, 2025
Technology
100
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Large scale distributed systems patterns
Ryosuke Iwanaga
September 22, 2025
Other Decks in Technology
See All in Technology
あなたの知らないPDFのアクセシビリティ
lycorptech_jp
PRO
0
200
Oracle AI Database@Azure:サービス概要のご紹介
oracle4engineer
PRO
6
2k
AWS Security Agent といっしょに脅威モデリングをやってみよう
amarelo_n24
0
120
2026TECHFRESH畢業分享會 - 原生還是跨平台? App 開發踩坑實錄
line_developers_tw
PRO
0
1.2k
【セミナー資料】Claude Code をセキュアに使うための考え方と設定の勘どころ / Claude Code Webinar 20260616
masahirokawahara
2
380
入門!AWS Blocks
ysuzuki
1
140
When Platform Engineering Meets GenAI
sucitw
0
100
ACE-Step-1.5で見る 音楽生成AIのしくみと“破綻だけ直す”Retake機能の開発【zennfes spring 2026 登壇資料】
personabb
1
520
AIソロプレナー時代に2ヶ月で20人増員した事業創造会社の開発組織の話
miyatakoji
0
680
RAG を使わないという選択肢
tatsutaka
1
250
新しいUbuntu/GNOMEが使いたいからXからWaylandへ移行頑張ってるの巻 2026-06-20
nobutomurata
0
140
Lightning近況報告
kozy4324
0
130
Featured
See All Featured
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
1
2k
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
1
3.6k
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
The Spectacular Lies of Maps
axbom
PRO
1
810
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
The Invisible Side of Design
smashingmag
302
52k
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
1
330
Faster Mobile Websites
deanohume
310
31k
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
180
Paper Plane
katiecoart
PRO
1
51k
SEO for Brand Visibility & Recognition
aleyda
0
4.6k
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.3k
Transcript
Large scale distributed systems patterns Ryosuke Iwanaga
Agenda Visit architecture patterns and talk about problems * Web
app * Cloud native * Microservice in scale * Resource management * Event system * Problem 1: Cold start * Problem 1: Poison pill
Distributed systems always fail => Design for failure Takeaways One
solution introduces another problem => Design exercise
2009 Mobile browser gaming (SRE / DBA) * ~5,000 physical
servers * MySQL replications, sharding * Datacenter operations AI! 2025 Cloud (Solutions architect) Distributed datastore (Developer) 2015 2018 * Architecture * Container * Analytics * Distributed system * ~50 Microservices * Horizontal scale * Cell-based My experience in large scale distributed systems
LB ... User Web app distributed system Replication delay Deadlock
Write bottleneck LB's scalability Typical problems: App App App App DB Writer DB Reader DB Reader ... DB Writer DB Reader DB Reader ... ... Payment
Write Read Read LB ... App App App App DB
DB DB ... Cache Cache Cache ... Web app distributed system Cache invalidation Cache scalability Typical problems:
... Server 1 Resource orchestrator/manager App1 Amazon EC2, Eucalyptus, OpenStack
Hadoop, Mesos, YARN, Omega, Borg, k8s Server 2 Server N App1 App1 App1 App2 Cloud resource distributed system App3 App2 App3 Manager’s scalability Consistency Typical problems:
App Stream Speed layer Event distributed system e.g. Lambda architecture
App App Stream process 1 Object storage Stream process 2 Batch process 1 Batch process 2 Batch layer At least once At most once Stream scalability Back pressure Typical problems:
App Service A ... User 1 Metadata User 1,3,4 User
1 => DB 1 User 2 => DB 2 User 3 => DB 1 ... 💀 DB 1 DB 2 Service B App App User 2 LB Service C User 2,5 Microservice distributed system ...
App Service A ... User 1 Metadata User 1,3,4 User
1 => DB 1 User 2 => DB 2 User 3 => DB 1 ... DB 1 DB 2 Service B App App User 2 LB Service C User 2,5 Microservice distributed system ... Cache 😁?
Warm start 🔄Restart ✅ https://aws.amazon.com/message/11201/ Cold start Metadata App App
App App App App App 🔄Restart App App App App App App App 🔄Restart 🔄Restart 🔄Restart 🔄Restart 🔄Restart 💀 Cold start problem
💊 If user 1's requests trigger a bug on app
that crashes the app... 💀 Retry Retry Retry Retry Retry Retry Retry 💀 0% availability => App User 1 App App User 2 App App App App App User 3 💀 💀 💀 💀 💀 💀 💀 LB Poison pill problem 0% availability => 0% availability =>
💊 💀 ✅ App User 1 App App User 2
App App App App App User 3 💀 Naive sharding 💀 💀 0% availability => 100% availability => 0% availability =>
💊 ✅ App User 1 App App User 2 App
App App App App User 3 💀 Shuffle sharding 💀 ✅ ✅ https://aws.amazon.com/blogs/architecture/shuffle-sharding-massive-and-magical-fault-isolation/ : Server set for user 1, 2 5 overlap (k=5): 0.00000013% 4 overlap (k=4): 0.00063% 3 overlap (k=3): 0.059% 2 overlap (k=2): 1.8% 1 overlap (k=1): 21% 0 overlap (k=0): 77% 💀 50% availability => 100% availability => 0% availability => : Number of total servers : Size of each shard : Overlap between user 1 and 2
What’s next? App Service A ... User 1 Metadata DB
1 DB 2 Service B App App User 2 Service C ... App Service D ... App App Metadata LB Service E 💀
Distributed systems always fail => Design for failure Takeaways (again)
One solution introduces another problem => Design exercise
Thanks! @riywo OpsBR