Upgrade to Pro — share decks privately, control downloads, hide ads and more …

I have come to bury the BIOS, not to open it: The need for holistic systems

I have come to bury the BIOS, not to open it: The need for holistic systems

Talk given at OSFC 2022 on September 19, 2022 in Gothenburg, Sweden. Video: https://vimeo.com/756050840

Bryan Cantrill

September 19, 2022
Tweet

More Decks by Bryan Cantrill

Other Decks in Technology

Transcript

  1. I have come to bury the
    BIOS, not to open it
    The need for holistic systems
    Bryan Cantrill
    Oxide Computer Company

    View Slide

  2. OXIDE
    In the beginning…
    • In the beginning, computing systems were holistic: hardware and
    software were designed together, to work with one another
    • System software was delivered with the computer – but often had to be
    newly developed for a new machine
    • Requiring new software for new hardware created schedule delays, viz.
    OS/360 and The Mythical Man Month
    • With the advent of Unix, system software became portable – it could be
    ported rather than developed de novo for each new computer…

    View Slide

  3. OXIDE
    Unix spreads – and feuds
    • The portability of Unix accelerated the minicomputer and workstation
    revolutions, with each manufacturer having its own variants
    • The systems of this era remained broadly holistic: the hardware and
    software were (broadly) designed with the other in mind
    • …but despite the original ethos of Unix, the variants themselves
    remained entirely proprietary – and the differences between them ignited
    the Unix Wars of the 1980s and 1990s

    View Slide

  4. OXIDE
    Elsewhere, homebrew computing
    • With the rise of the microcomputer, computing became much more
    broadly available in the 1970s – but nearly absurd variety with respect to
    hardware made software standardization challenging
    • The hardware-specific half of CP/M – the dominant microcomputer OS
    of the 1970 – was the Basic Input Output System, and could be
    delivered separately
    • This gave rise to hardware vendors delivering ROMs that contained
    platform enablement code roughly standardized as a “System BIOS”

    View Slide

  5. OXIDE
    The IBM PC era
    • With the emergence of the IBM PC – and its de facto standardization by
    Compaq – the system software/BIOS split became irreconcilable
    • Essential hardware enabling-software was driven into the BIOS
    • The BIOS interface became what system software bound to – it became
    the definition of “compatibility”
    • Worse, the software components on both sides of the BIOS/OS divide
    were nearly exclusively proprietary, serving to harden the boundary

    View Slide

  6. OXIDE
    It gets worse: SMM
    • In order to be able implement system software functionality delivered by
    the hardware – e.g., laptop suspend and resume – system management
    mode was invented
    • SMM allows effectively arbitrary, hidden code execution at arbitrary time
    without even allowing system software awareness
    • This is the opposite of a holistic system: it is one that has been
    deliberately and perniciously divided!

    View Slide

  7. OXIDE
    EFI/UEFI
    • All of this might have been fine had x86 remained relegated to personal
    computing…
    • …but Intel and AMD out-executed the RISC vendors in the 2000s,
    forcing PC constructs into the server space
    • Starting with (ill-fated) Itanium, Intel introduced EFI in an attempt to
    modernize…

    View Slide

  8. OXIDE
    UEFI: What might have been
    Source: Beyond BIOS: Developing with the Unified Extensible Firmware Interface

    View Slide

  9. OXIDE
    UEFI: What happened instead
    • While its goals were laudable, UEFI was overconstrained
    • In particular, the need for legacy and Windows compatibility required
    UEFI to support all past abstractions
    • UEFI has become the worst of all worlds: complicated, proprietary
    software that remains at once isolated from – yet also still entirely
    entangled with! – system software
    • UEFI has become so entangled with lowest-level platform enablement
    that non-UEFI platforms are effectively impossible

    View Slide

  10. OXIDE
    It gets worse, again: Hidden cores
    • A dividend of Moore’s Law: formerly discrete components were
    increasingly pulled first into large ASICs – and then pulled on-die into a
    system-on-a-chip
    • Especially as I/O was brought directly into the die, CPUs developed an
    increasing numbers of non-architectural cores to manage it
    • But these cores are hidden to system software – the operating system
    is being confined to an increasingly narrow slice of the true hardware
    capabilities of the system…

    View Slide

  11. OXIDE
    …which is not lost on everyone!
    Timothy Roscoe, OSDI 2021 Keynote, It's Time for Operating Systems to Rediscover Hardware

    View Slide

  12. OXIDE
    The battle for non-architectural cores
    • Roscoe (rightfully) calls this a “security catastrophe”
    • The non-architectural cores are – on x86 CPUs anyway – entirely
    proprietary, with all of its concomitant problems; that the system is
    “open source” is increasingly a myth
    • Roscoe correctly identifies the problem, but understates the severity:
    this isn’t a retreat of Linux – it is a resurgence of proprietary operating
    systems, wrapping themselves in firmware

    View Slide

  13. OXIDE
    Is an open source BIOS the answer?
    • An open source BIOS is certainly valuable and laudable – but if history is
    any guide, it is also not sustainable
    • The problem is not (merely) the proprietary BIOS – it is the ubiquity of the
    abstraction that splits our stack into open and proprietary halves
    • The presence of a deeply proprietary platform enablement layer allows
    for wildly complicated SoCs to have vast, undocumented elements – the
    implementation of the firmware has become the documentation!
    • We need a different model

    View Slide

  14. OXIDE
    The need for (a to return to) holistic systems
    • The platform enablement boundary as we know it today is largely
    vestigial – it serves to create abstractions that are broadly unnecessary
    • We need systems that obliterate these boundaries – that are rather
    holistic systems in which software and hardware are co-designed
    • Resetting system state over the course of booting is not holistic!
    • Holistic systems require us to be willing to take up Roscoe’s challenge
    and adopt SoC specificity in our operating systems

    View Slide

  15. OXIDE
    Oxide’s approach
    • At Oxide, we are taking a from-scratch, rack-scale approach to
    server-side computing, with AMD Milan-based sleds of our own design
    • We do not have a traditional BMC, but rather a fit-to-purpose service
    processor (an STM32H753) and RoT (LP55S28), both running our own
    (Rust-based, open source) OS, Hubris (see Cliff Biffle’s OSFC 2021 talk!)
    • Our approach is holistic but open
    • Could we develop a truly holistic system on x86?

    View Slide

  16. OXIDE
    Aside: AMD Details
    • On AMD, the Platform Security Processor (PSP) is a non-architectural
    core that executes proprietary software to perform system initialization –
    including DRAM training
    • System management controller (SP in our case) puts the PSP payload
    into SPI flash and brings the CPU out of reset
    • The PSP will perform its initialization and eventually vector into host
    software executing on the bootstrap core (BSC)
    • Historically, post-PSP initialization done by AMD’s AGESA firmware –
    which makes a holistic system impossible

    View Slide

  17. OXIDE
    Challenge #1: Initialization
    • To implement holistic boot, system software must perform the activities
    historically done by AGESA
    • Modern CPUs are very complicated! Post-PSP initialization includes
    configuring I/O interconnects, core complexes, etc.
    • For AMD Milan, this specifically includes DXIO engine configuration,
    NBIO PCIe strapping, hotplug configuration
    • The software that has implemented this level of initialization has
    historically been done by the CPU vendor; these interfaces are not
    always documented thoroughly – if at all!

    View Slide

  18. OXIDE
    Challenge #2: Boot Phasing
    • Payload that boots from PSP is size-constrained to ~13MB
    • Stage-based approaches (e.g., oreboot + LinuxBoot) use Linux drivers
    to load (and execute) a production kernel
    • This necessitates a pseudo-reset of the system – as well as the creation
    or emulation of an interface (e.g., ACPI) to pass system state to later
    stages
    • We instead adopt a phase-based approach whereby part of the
    system is loaded from SPI NOR and is able to load the remainder from
    SSDs – but the system is never discarded

    View Slide

  19. OXIDE
    Holistic booting!
    • Helios is our illumos derivative that includes the Oxide bhyve-based
    hypervisor – and runs our rack-wide control plane
    • We have holistic Helios booting on our EVT compute sleds, including all
    necessary functionality for platform initialization (I/O, SMP, etc.)
    • Phased boot has enough in SPI to be able to import ZFS pools from M.2
    devices
    • Helios – along with all Oxide-authored software – will be open source
    when we ship our first racks at the end of the year!

    View Slide

  20. OXIDE
    Towards holistic systems
    • Holistic systems have clear advantages in terms of reliability, security,
    observability, manageability, sustainability, etc.
    • Based on our experience to date, holistic systems are challenging to
    implement but emphatically attainable
    • Documentation from microprocessor vendors is essential; they
    have much to gain by encouraging more software on their platforms!
    • Oxide may represent the first open, holistic server-side system in the
    post-PC x86 era – but unlikely to be the last!

    View Slide