Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Premday #3 - KVM Hypervisor Recovery - When dis...

Premday #3 - KVM Hypervisor Recovery - When disaster strikes

Booking.com presents its strategies for anticipating and managing disaster recovery plans.

Avatar for Premday

Premday

June 08, 2026

More Decks by Premday

Other Decks in Technology

Transcript

  1. Agenda. Introductions • About Me • Booking.com Fast Facts •

    Setting Expectations KVM Hypervisor Recovery • BC2: Virtualization @ Booking • Problem: Disaster Recovery • Solution: Immutable Snapshots Conclusion • Lessons Learned • Call to Action • Q&A
  2. Principal Engineer at Booking.com • Based in Amsterdam • 31

    years of industry experience • Specialties ✓ Virtualization ✓ Data Storage ✓ Disaster Recovery • Personal ✓ Live on a houseboat ✓ Love to travel ✓ Excited to be in Paris! About Me.
  3. Fast Facts • Headquartered in Amsterdam, Netherlands • 29 million+

    listings • More than 174 thousand destinations • Over 220 countries and territories • Website and support in 45 languages • 100k+ servers, VMs, and containers • 3 Availability Zones in Central/Western Europe
  4. No 🚫 • Case Studies • Product Pitches • Artificial

    Intelligence • Misuse of "on premise" Yes ✅ • Old Ideas • Old Presenter • Old School Engineering Setting Expectations.
  5. • In-house virtualization platform • Linux KVM hypervisors + EPYC

    servers • In production since July 2025 • Replacing Bare Metal servers • Replacing OpenStack virtual machines • One of the fastest growing infrastructure services in Booking's history What is BC2?
  6. Success Story - MySQL Instances. • Nov 2025: Exceeded count

    on AWS EC2 • Jan 2026: Exceeded count on Bare Metal • Mar 2026: Exceeded count on OpenStack • EOY 2026: 100% BC2 On-Prem ✓ ~10,000 Instances ✓ ~1,000 Hypervisors
  7. Cyclic Dependency. • Foundational services all moving to BC2 ✓

    ServerDB ✓ DNS ✓ NTP ✓ DHCP ✓ Puppet ✓ Vault
  8. • On-prem services depend on BC2 • BC2 depends on

    on-prem services • How should disaster recovery work? • Vanilla Linux + KVM offers little guidance here • Off the shelf data protection solutions are still VMware focused • What to do??? The Problem - Disaster Recovery.
  9. The Solution - Immutable Snapshots. • Virtual Machines ✓ Run

    VM images on NFS ✓ Snapshot NFS every 10 minutes ✓ Make snapshots immutable • Hypervisors ✓ Leverage LVM snapshots ✓ Image the boot drive to NFS ✓ Make the image immutable
  10. About NFS. • Created by Sun Microsystems in 1984 •

    Version 3 (Jun 1995) is 30 years old • Version 4 (Dec 2000) is merely 25 • Version 4.1 (Jan 2010) just turned 16 • Version 4.2 (Nov 2016) is almost 10 "If I'd known I was going to live this long, I'd have taken better care of myself.” - Eubie Blake
  11. Why use NFS in 2026? • Ubiquitous support in NAS

    appliances • Broad hypervisor support (including KVM) • Compliance grade immutability • Snapshots • Deduplication • Malware Scanning • Ransomware Detection "Had I known what you were going to do with it I would never have invented it!" - Sir Robert Watson-Watt (RADAR Pioneer)
  12. Network Design. ITDR Rack Data Switch 1 Data Switch 2

    NFS Appliance Hypervisor 1 Hypervisor 2 Hypervisor 3 Hypervisor 4 Hypervisor 5 • Rack local storage networking • 100Gb bandwidth per-node • Only accessible to hypervisors • Private VLAN for NFS traffic • IPv4 Link Local Addressing • Caching, timeouts, locking disabled
  13. VM Snapshot Process. VM1 VM2 VM3 Hypervisor /var/lib/libvirt/images NFS Appliance

    NFS v3 • VM image mountpoint mapped to NFS volume • Hypervisor configuration is otherwise unchanged • NFS appliance takes atomic snapshots every 10 minutes • Snapshots are crash consistent (no quiescing of VMs) • Compliance mode prevents snapshot deletion
  14. Hypervisor Snapshot Process. • Hypervisor boot drive snapshots are taken

    every 10 minutes • A sparse image identical to the boot drive is created on NFS • Partitions and volumes are cloned with ~1 second granularity • A map of the resulting sparse file is generated for fast restore • Compliance mode immutability is set on the output files Hypervisor Boot Drive ✓ Partition Table ✓ Boot Partition ✓ LVM Volumes NFS Appliance DR Image ✓ Image File ✓ Map File ✓ Log File NFS v3 NFS v4.2
  15. Hypervisor Snapshot Granularity. • Hypervisor snapshots are not atomic •

    Boot partitions are small ✓ Copies take < 1 second • LVM snapshots are fast ✓ Snaps take < 1 second for all volumes ✓ Snaps are deleted once copied • Point-in-time granularity of ~1 second for each boot drive snapshot efi partition 200M boot partition 1000M / (root_snap) 64G /home (home_snap) 64G /var (var_snap) 64G /var/log (log_snap) 64G /tmp (tmp_snap) 16G Boot Drive
  16. Why NFS 4.2? • NFS 4.2 has extra parameters to

    lseek() ✓ SEEK_HOLE ✓ SEEK_DATA • Without these parameters, generating a sparse map is not possible (holes and data look the same). • Hypervisor images contain thousands of holes (~99% of a 1.9T boot drive).
  17. Accelerators - Backup Path. • Local NVMe boot drive •

    Multi-threaded write of image file • Fast zeroing of image file ✓ Sparse allocation (< 1 second) • Sparse writes ✓ Write holes instead of zeroes ✓ dd conv=sparse • Allocation aware cloning ✓ LVM blocks ✓ e2image ✓ partclone
  18. Accelerators - Restore Path. • NFSv3 with nconnect=16 • Multi-threaded

    read of image file • Fast zeroing of boot drive ✓ nvme format (< 1 second) • Sparse reads (map file) ✓ Read data / skip holes
  19. • Snapshots of Bare Metal can be fast! • NFSv3

    is still a workhorse • NFSv4 is powerful but needs care • POSIX APIs and tooling still solve problems that have not been solved elsewhere • Disaster recovery without expensive proprietary software is possible! Lessons Learned.
  20. • Support v4.2! ✓ SEEK_HOLE ✓ SEEK_DATA • Support Immutability!

    ✓ Per-file invocation (atime) ✓ Compliance mode is not optional Call to Action! NFS Vendors Virtualization Vendors / Cloud Providers • Support open source backup and restore • Many large gaps vs. commercial tooling
  21. Setting Expectations Wikimedia Commons - Allen Beaulieu Systemic Dependency XKCD

    - Dependency The Problem - Disaster Recovery Wikimedia Commons - EU, Copernicus Sentinel-2 imagery The Solution - Immutable Snapshots Wikimedia Commons - 1930s Leica advert in Time magazine About NFS Wikimedia Commons - U.S. Library of Congress Why NFS in 2026? Imperial War Museum - Catalogue number CH 15337 Why NFS 4.2? Wikimedia Commons - Cat Chapman Lessons Learned Booking.com Media Library - Hufton+Crow / UNStudio Call to Action Wikimedia Commons - Gallard No Attribution Peter Buschman - Personal Library Sidebar Image Credits.