Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How Patroni solved Database Reliability at Gojek
Search
Kumar Abhijeet
March 28, 2024
0
14
How Patroni solved Database Reliability at Gojek
Kumar Abhijeet
March 28, 2024
Tweet
Share
More Decks by Kumar Abhijeet
See All by Kumar Abhijeet
Multi-Region APIOps with Kong
kumar_abhijeet
0
41
Be a Master Chef: Crafting Recipes for Reliable Infrastructure
kumar_abhijeet
0
23
Featured
See All Featured
Navigating Team Friction
lara
183
14k
A Philosophy of Restraint
colly
203
16k
Ruby is Unlike a Banana
tanoku
97
11k
Large-scale JavaScript Application Architecture
addyosmani
510
110k
Designing the Hi-DPI Web
ddemaree
280
34k
Raft: Consensus for Rubyists
vanstee
136
6.6k
VelocityConf: Rendering Performance Case Studies
addyosmani
326
24k
Documentation Writing (for coders)
carmenintech
65
4.4k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
25
1.8k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
131
33k
What's in a price? How to price your products and services
michaelherold
243
12k
Product Roadmaps are Hard
iamctodd
PRO
49
11k
Transcript
How Patroni solved Database Reliability at Gojek Kumar Abhijeet Cloud
Platforms
Fabricating DBaaS@Gojek DevOps/Platforms Home Gym Owner Budding Musician
Agenda Gojek - Scale and Microservices Databases & Reliability Patroni
& 5 9s of Availability Deep dive into Patroni Managing Patroni on production - Lessons & Experiences
~600 microservices running on production ~400 have databases
600k RPM 12000 WALs/hour
18Bn record inserts/month 85Bn records fetched/month
Will a conventional master-slave PostgreSQL system be able to support
country-level scale?
App Server Workloads PostgreSQL VMs API Traffic LB
App Server Workloads PostgreSQL VMs API Traffic LB
Cloud Provider’s Compute Uptime >= 99.9% < 8h 41m of
downtime/year Across multiple zones >= 99.99% < 52m of downtime/year
Database Uptime ≅ App Uptime
Target >= 99.999% Uptime Less than 5m of downtime/year
App Server Workloads PostgreSQL VMs API Traffic LB New Master
Old Master Replica
None
App Server Workloads PostgreSQL VMs API Traffic LB New Master
Old Master Replica shared_buffers=16MB shared_buffers=2GB
Enter Patroni!
Patroni Open Source and actively maintained by Zalando Converts PostgreSQL
systems into Highly Available Fault Tolerant Disaster Ready
None
None
None
Patroni Almost instantaneous failovers (~1-2s) Way cheaper than running managed
DB solutions Cluster Management made easy Multi Region HA Deployments
None
HA Loop Flow
None
None
Downtime in Seconds≈0.0000315576
None
Patroni at Gojek 200+ clusters running on Production ~60 TB
of data flows in/out every day Guarantees less than 10MBs of data loss Consul as DCS and service discovery IAC everywhere!
Patroni at Gojek TF Modules for Provisioning/Chef for configuration Sync/Async
replication choices All round observability! Secure and granular role-based access PR based workflow for infra provisioning
None
None
None
Thank you!