Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PremDay #2 - Smells like bare metal

Avatar for PremDay PremDay
April 18, 2025

PremDay #2 - Smells like bare metal

Peter Buschman details how Booking.com subdivides dense hardware in a way that integrates with bare metal-based provisioning.

Avatar for PremDay

PremDay

April 18, 2025
Tweet

More Decks by PremDay

Other Decks in Technology

Transcript

  1. Agenda. Introductions • About Me • Booking.com Fast Facts •

    Setting Expectations From Blade Servers to Partitions • Bare Metal at Booking • Problem: Blade Server EOL • Solution: Server Partitions Conclusion • Lessons Learned • Call to Action • Q&A
  2. Principal Engineer at Booking.com • Based in Amsterdam • 30

    years of industry experience • Specialties ✓ Virtualization ✓ Data Storage ✓ Disaster Recovery • Personal ✓ Live on a houseboat ✓ Love to travel ✓ Excited to be in Paris! About Me.
  3. Fast Facts • Headquartered in Amsterdam, Netherlands • 29 million+

    listings • More than 174 thousand destinations • Over 220 countries and territories • Website and support in 45 languages • 100k+ servers, VMs, and containers • 3 Availability Zones in Central/Western Europe
  4. No 🚫 • Case Studies • Product Pitches • Artificial

    Intelligence • Misuse of "on premise" Yes ✅ • Old Ideas • History Lessons • Old School Engineering Setting Expectations.
  5. • Majority Blade Servers ▸ 16 cores / 32 threads

    ▸ 128GB RAM ▸ ~2TB SSD ▸ 10Gb Network • Peak Population ~50,000 • 2025 Population: ~30,000 • EOA since 2020 • 5-10 years old in 2025!!! Bare Metal at Booking.
  6. • Data Center Efficiency • Legacy Applications • Blade Form

    Factor EOL • Platform Modernization ▸ Cloud Native ▸ Containers ▸ Virtual Machines • Timelines not aligned! ▸ Refactoring is slow ▸ Delayed migrations ▸ Legacy apps left behind Challenges in 2025.
  7. • No replacement server SKU • No appetite for new

    models • Doing nothing not possible • Business pressure to modernize • Limited engineering resources Blade Server Retirement.
  8. • Many teams use EC2 in AWS • Acceptance level

    is very high • Little refactoring required • Could a similar approach work on-prem? • What would we call it? Et tu, Cloud?
  9. • Booking EC2 • Booking Compute 2 • Bare Metal

    Compute 2 • Blade Compute 2 • Blade Center 2 What does BC2 stand for?
  10. • Partitioning of high density server SKUs designed for modern

    cloud native applications • Logical instances that equal or exceed the performance of legacy blade servers • 100% compatibility with legacy tooling • Virtually indistinguishable from blade servers for customers What is BC2?
  11. Technical Philosophy. Boring and imperfect Dan McKinley • Choose Boring

    Technology ✓ Known failure modes ✓ Few unknown unknowns Sir Robert Alexander Watson-Watt • The Cult of the Imperfect ✓ Use the 3rd best option ✓ The 2nd best comes too late ✓ The best never comes at all
  12. • EC2 and BC2 are examples of partitioning • Partitions

    are logical servers in physical ones • Resources are dedicated, not shared • Hardware partitioning was the historical norm • Software partitioning is the modern reality • Partitioning vs. Virtualization ▸ No oversubscription ▸ Direct hardware access ▸ Persistent vs. ephemeral identity What is Server Partitioning?
  13. 2U Server Chassis Anatomy of a BC2 host. NUMA 0

    NUMA 1 768GB DDR5 RAM 768GB DDR5 RAM 48C / 96T 48C / 96T 2 x 15.36TB U.3 NVMe (up to 7 possible) 4 x 25Gb Shared Network Bridge
  14. • CPUs (1-40) [passthrough] • 1GB Pages (16-640) [may substitute

    2M] • IO Threads (1 pinch) [do not over-season] • RAW Image Files (1-10TB) [stir until thick] • Do not DISCARD the TRIMmings • Garnish with SERIAL, MAC, and IP • Serve hot and consume immediately BC2 Instance Recipe.
  15. BC2 Identification. Manufacturer Model # virt-install --sysinfo smbios, system.manufacturer=BC2, system.product=E1

    <sysinfo type='smbios'> <system> <entry name='manufacturer'>BC2</entry> <entry name='product'>E1</entry> </system>
  16. • pxe_boot() • power_off() • power_cycle() • power_on() • power_status()

    • soft_reseat() • power_cycle_bmc() Server Reboot API. ServerDB BC2 Instance Blade Server +
  17. • pxe_boot() • power_off() • power_cycle() • power_on() • power_status()

    • soft_reseat() • power_cycle_bmc() Server Reboot API. ServerDB BC2 Instance Blade Server +
  18. • VMs can outperform Bare Metal • Hiding NUMA has

    great benefits • Legacy abstractions can be very sticky • Simple architecture meets deadlines • It is possible to be too fast! ▸ DHCP race condition ▸ Vault credentials not ready yet ▸ Network inexplicably faster? Lessons Learned.
  19. • Open Source Collaboration • Open Firmware Management • NVMe

    vendors please support SR-IOV! • OEMs please stop disabling SR-IOV! • Future Proof Parts / Interchangeability • Tool-less / screw-less drive caddies • Single Socket w/ Big Fans ▸ 2U / 2-node / ½ width Call to Action!
  20. Setting Expectations Wikimedia Commons - Marsyas Et tu, Cloud? Wikimedia

    Commons - Tony Webster What is Server Partitioning? Sun Enterprise[tm] 10000 Server - Service Views Technical Philosophy Imperial War Museum - Catalogue number CH 15337 BC2 Instance Recipe Wikimedia Commons - Lynn Gilbert Lessons Learned Booking.com Media Library - Hufton+Crow / UNStudio Call to Action Wikimedia Commons - Gallard Sidebar Image Credits.