Storing data is something we expect computers to just do. When your application writes data to a database, you trust it to give you that data back later, but what does it take to make that reliable?
In this session, we'll explore the ways that computers can surprise us by failing to save or corrupting our data. We'll do this through the lens of databases, with a focus on MySQL and Postgres.
Specifically, we'll cover:
- The MySQL doublewrite buffer: the mechanism MySQL uses to guarantee writes make it safely to disk
- The Postgres fsyncgate incident: where the Postgres team realised that the guarantees around Linux's fsync syscall weren't as strong as they thought
- Write-through caches on disks: how manufacturers win benchmarks at the cost of data safety
We'll also look at how database replication can partially paper over these problems for us and the limits of what it can do.