Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How Patroni solved Database Reliability at Gojek
Search
Kumar Abhijeet
March 28, 2024
0
19
How Patroni solved Database Reliability at Gojek
Kumar Abhijeet
March 28, 2024
Tweet
Share
More Decks by Kumar Abhijeet
See All by Kumar Abhijeet
Multi-Region APIOps with Kong
kumar_abhijeet
0
60
Be a Master Chef: Crafting Recipes for Reliable Infrastructure
kumar_abhijeet
0
29
Featured
See All Featured
Understanding Cognitive Biases in Performance Measurement
bluesmoon
31
2.7k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
Leading Effective Engineering Teams in the AI Era
addyosmani
7
580
Testing 201, or: Great Expectations
jmmastey
45
7.7k
The Art of Programming - Codeland 2020
erikaheidi
56
14k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.7k
Typedesign – Prime Four
hannesfritz
42
2.8k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.5k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
37
2.6k
Designing for Performance
lara
610
69k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
657
61k
Keith and Marios Guide to Fast Websites
keithpitt
411
23k
Transcript
How Patroni solved Database Reliability at Gojek Kumar Abhijeet Cloud
Platforms
Fabricating DBaaS@Gojek DevOps/Platforms Home Gym Owner Budding Musician
Agenda Gojek - Scale and Microservices Databases & Reliability Patroni
& 5 9s of Availability Deep dive into Patroni Managing Patroni on production - Lessons & Experiences
~600 microservices running on production ~400 have databases
600k RPM 12000 WALs/hour
18Bn record inserts/month 85Bn records fetched/month
Will a conventional master-slave PostgreSQL system be able to support
country-level scale?
App Server Workloads PostgreSQL VMs API Traffic LB
App Server Workloads PostgreSQL VMs API Traffic LB
Cloud Provider’s Compute Uptime >= 99.9% < 8h 41m of
downtime/year Across multiple zones >= 99.99% < 52m of downtime/year
Database Uptime ≅ App Uptime
Target >= 99.999% Uptime Less than 5m of downtime/year
App Server Workloads PostgreSQL VMs API Traffic LB New Master
Old Master Replica
None
App Server Workloads PostgreSQL VMs API Traffic LB New Master
Old Master Replica shared_buffers=16MB shared_buffers=2GB
Enter Patroni!
Patroni Open Source and actively maintained by Zalando Converts PostgreSQL
systems into Highly Available Fault Tolerant Disaster Ready
None
None
None
Patroni Almost instantaneous failovers (~1-2s) Way cheaper than running managed
DB solutions Cluster Management made easy Multi Region HA Deployments
None
HA Loop Flow
None
None
Downtime in Seconds≈0.0000315576
None
Patroni at Gojek 200+ clusters running on Production ~60 TB
of data flows in/out every day Guarantees less than 10MBs of data loss Consul as DCS and service discovery IAC everywhere!
Patroni at Gojek TF Modules for Provisioning/Chef for configuration Sync/Async
replication choices All round observability! Secure and granular role-based access PR based workflow for infra provisioning
None
None
None
Thank you!