Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PremDay #2 - CSP Hardware Management Challenges...

Avatar for PremDay PremDay
April 07, 2025

PremDay #2 - CSP Hardware Management Challenges: a Return of Experience

Server monitoring is usually done with Redfish & IPMI.
Elyès Zekri from Scaleway details how a mid-size scalers sees such tooling and protocols.

Avatar for PremDay

PremDay

April 07, 2025
Tweet

More Decks by PremDay

Other Decks in Technology

Transcript

  1. Content HW Platform Management State of the Art) 2 HW

    Platform Management Scaleway 3 HW Platform Management Challenges 1
  2. 3 Hardware Platform Management (State of the Art) Baseboard Management

    Controller BMC Remote HW Management Redfish Client IPMI Client OEM Clients Network interface Out-of-Band) TCP/IP, UDP,...) Redfish IPMI CIM OEM SNMP rKVM … Protocol/Standard Component/Device BMC Interface SW/FW Host System OS BIOS UEFI Redfish Client IPMI Client OEM clients Host interfaces In-band) KCS, Serial, Eth-o-USB, ...) Redfish IPMI MCTP … OEM Focus on out-of-band and in-band BMC interfaces Auxiliary Management Controllers GPU tray MC Power MC … CPLD Devices & Peripherals Accelerators Fans Storage controllers NICs DPUs PSUs … RoT MC-to-MC interfaces PCIe, SMBus/I2C, ...) Redfish MCTP PLDM … Physical interfaces PCIe, SMBus/I2C, I3C, RBT, ...) MCTP PLDM SPDM NCSI FRU SMBIOS … inside-the-box outside-the-box
  3. 4 Hardware Platform Management at Scaleway Scaleway handles multi-vendor Hardware

    Servers, NICs, Disks, ...) ◦ Multiple interfaces, data sources, standards (eg. IPMI, Redfish, FRU specification) ◦ Several tools and dependencies (ipmitool, racadm, ilorest, sum, ...) ◦ Complex integration, automation, maintenance, ... ◦ Impact on production process efficiency → We need to rely on simpler and more interoperable vendors Hardware Management Systems
  4. 5 Hardware Management across deployment process at Scaleway HW Production

    Stages HW management operations Teams Assembly & Conformity Checking • HW components inventory • HW sanity & conformance check • FW update • FW initial configuration • HW Asset database seeding HW Tooling HW Discovery & Pre-configure • Host OS provisioning • FW/SW product-specific configuration • DCIM system seeding HW Bootstrapping Operating & Servicing • HW administration Power control, remote access, FW update,...) • HW Monitoring • HW diagnostics and maintenance SRE & Support
  5. 6 The spiderweb HW abstraction layer Third-party and generic tools

    set Vendor platforms Vendor_1 tool_1 Vendor_2 tool_2 High-level Mgmt cmd_1 High-level Mgmt cmd_2 High-level Mgmt cmd_n Vendor_1 tool_2 Vendor_2 tool_1 Redfish generic tool IPMI generic tool Vendor_1 Platform_x REDFISH_1 IPMI_1 OEM Vendor_2 Platform_y REDFISH_2 IPMI_2 OEM . . . Vendor_n Platform_z REDFISH_n IPMI_n OEM . . . HW Administration Vendor_n tool_2 Vendor_n tool_1 . . . . . .
  6. 7 Challenge n°1 - A multitude of HW Management tools

    Implications • OS compatibility issues • Long-term support constraints • For each new tool to be integrated, full functional validation is required Statements • Scaleway 15 different tools only dedicated to interact with BMCs (including OEM and generic tools) • Some tools can only be used in-band, while others only work out-of-band 1 Implement Redfish Services 2 Use unified, interoperable, modern standard-based tools (eg. generic Redfish client) 3 Provide tools that can be used both in-band and out-of-band Best practices
  7. 8 Challenge n°2 - Mismatching implementation of Redfish API Statements

    • Excessive use of OEM Redfish endpoints • Partial Redfish implementation (missing features) • Wrong Redfish implementation (inconsistency with standard schemas) Implications • Excessive use of OEM schemas → too different vendor Redfish API implementations • Partial Redfish → Forced to use alternative tools → increased number of tools (eg. Challenge n°1 • Inconsistency with Redfish schema → generic Redfish tools fail 1 Stick as much as possible to standard specifications 2 Use official DMTF conformance tools(*) to validate Redfish services 3 Rely on reference open source Redfish implementations (eg. OpenBMC) to avoid discrepancies Best practices (*) https://github.com/DMTF/Redfish-Interop-Validator https://github.com/DMTF/Redfish-Service-Validator https://github.com/DMTF/Redfish-JsonSchema-ResponseValidator https://github.com/DMTF/Redfish-Protocol-Validator
  8. 9 Challenge n°3 - OpenBMC as a reference Firmware Stack

    Statements Too many tools (cf. challenge n°1 Discrepancies wrt to standards specification (cf. challenge n°2 OpenBMC is not being adopted fast enough by vendors Rely on OpenBMC Option 1: Closed-source OpenBMC-based FW 😀 FW should inherit a best of breed, full Redfish implementation 😀 discrepancy with other OpenBMC-based APIs should be minimal 😞 no customization possibilities Option 2 (preferred): Open-source OpenBMC-based FW 😀 same pros than previous solution 😀 possibility to expand/customize the FW 😀 inhouse troubleshooting and bug fixes 😞 may need to handle transfer of ownership on customer side Implications Big impact on integration and maintenance complexity and production efficiency Best practices
  9. 10 In a nutshell Challenge 1 - Too many tools

    rely as much as possible on standard tools + use Redfish Challenge 2 - Mismatching Redfish implementation minimize OEM usage + use OpenBMC Challenge 3 - Need more OpenBMC-based FW stacks 2 possible release models: ➢ proprietary/closed source ➢ open source (preferred)
  10. 11 I Have a Dream… High-level Mgmt query 1 High-level

    Mgmt query 2 High-level Mgmt query n . . . Redfish generic tool/API Vendor_1 Platform_x REDFISH Vendor_2 Platform_y REDFISH Vendor_n Platform_z REDFISH . . .