PremDay

Prem’Day 2024 Stéphane DUTILLEUL May 16th 2024 Challenges and solutions
to operate a large and diverse on-prem hardware range

01 Scaleway at a Glance 02 The Form Factor perspective
03 The Density perspective 04 The Maintainability perspective Table of content

4 Part 01 Scaleway at a glance

5 Who is Scaleway ? Scaleway at a glance •
A European cloud provider, founded in 1999. The most complete cloud ecosystem in Europe • Multi-AZ redundancy • Operates sustainable data centers in France, the Netherlands and Poland • 2nd life : servers reused for 10 years (vs industry standard of 3 - 4 ) : reduces CO2 and e-waste

6 Who is Scaleway ? Scaleway at a glance •
Late 2023 : • joined the OCP Community • open an OCP Experience Center • https://eu.osfci.tech/ci/

7 Who is Stéphane ? Scaleway at a glance •
Joined Scaleway early 2018 after 19 years at Sun Microsystems (and Oracle) • VP Hardware Engineering • Part of the Operations department • Scope of responsibilities • HW architecture / solutions and validations • Rack Level Design • In-house tools dev (configuration, quality, manufacturing) • HW installed base maintenance

8 Ecosystem, portfolio Bare Metal Public Cloud AI Scaleway at
a glance

9 On premises installed base Scaleway at a glance •
In quantity : dozens of thousands of servers, • Geographically : operated in several locations / regions -> multi-AZ ◦ And different DC infrastructures • Products : covering a large variety of products and so several typologies of servers (intensive compute, AI, storage, commodity hardware, …) • Sustainability / Free and adiabatic cooling • Continuous deployment Large …

10 On premises installed base Scaleway at a glance •
Servers in production and maintained for 10 years, ◦ Sometimes up to 14 years … • Multi vendor with a mix of numerous generations of chassis/servers, ◦ Up to 37 types of chassis at some point in time • Mix of many flavors and versions of firmware, • Mix of several technologies ( DDR, HDD, SSD, HW RAID …) … and diverse

11 Part 02 The Form Factor perspective

12 Challenges The Form Factor perspective • Many products serving
different purposes → diversity of chassis often is a requirement ◦ Use several chassis for a SCW product ◦ A chassis server must fit as many customers requirements as possible ◦ However, we would like to limit the number of chassis types • One-size-fits-all is not a solution for us ◦ -> 1U, 2U, 3U, 4U, … • Rear IO not ideal ◦ Esp. when operating with hot aisle / adiabatic

13 Challenges The Form Factor perspective • Obvious impact on
the cooling capacity and power consumption ◦ Chassis size ( 2U ) ◦ Fan size • A lot of possible configurations/form factors ◦ Generally overkill ◦ But hard to find suitable configurations • Personalization and customization requires the capability to be flexible ◦ Baremetal and Cloud servers don’t always have the same requirements

14 Solutions The Form Factor perspective • Front IO ->
better for ◦ rack level density ◦ space optimization ◦ Cabling ◦ Serviceability ◦ Cooling • Configurations flexibility -> build a “catalogue” ◦ Diskless + bracket ▪ For a better control on the disks and DIMMs references ▪ For spare parts management as well ◦ Manufacturing Level 6 + CPU (incl. tests) in some cases

15 Solutions The Form Factor perspective • DC - MHS
+ DC - SCM for better chassis configurations ◦ Interoperability across platform / DC - SCM ◦ Consistent form factors and interfaces ◦ Improve modularity ◦ And helps to limit the number of chassis “flavors” • HW replacement → RMA ◦ Process TBC ◦ Esp. for DC - SCM / interoperability ? • Chassis/server customization / de-feature can provide simpler configuration ◦ Cheaper but also … ◦ Less components / less defect ◦ Easier maintenance

16 Part 03 The Density perspective

17 Challenges The Density perspective • Various products -> density
at the rack level rather than chassis/server (core) level • We are not the “end customer” -> for power consumption purpose, we have to predict and anticipate ◦ The work load average ◦ The usage rate ◦ Redundancy -> availability ◦ etc…

18 Challenges The Density perspective • Dual socket MB is
not a suitable solution for our business ◦ Dedicated to specific cases ◦ Blast radius • Cooling → mix air / DLC ◦ Low end -> air cooling ◦ High end -> DLC • Impact the Net ports costs and availability • Standard when dealing with constraints ◦ Different power designs

19 Solutions The Density perspective • Always define the trade-off
between ◦ work load average ◦ The usage rate ◦ Power redundancy • Mix of air and DLC cooling ◦ Driven by the large and diverse installed base ◦ Product ◦ Cost effectiveness ◦ DC infra readiness / room level ◦ Vs adaptability

20 Solutions The Density perspective • More and more cores
in a cpu with the resulting power consumption is not THE solution • Open Rack v3 (incl. Front IO ) ◦ Density ◦ Assembly ◦ Maintenance / S12Y ◦ Power efficiency

21 Part 04 The Maintainability perspective

22 Challenges • FW management ◦ Outdated IPMI interface ◦
Transition to Redfish is not straightforward ▪ For instance, inband comm is not standardized … yet ◦ Compatibility / interoperability / proprietary stack • Multi chassis + continuous deployment ◦ Management using different FW flavors ◦ Generations, versions The Maintainability perspective

23 Challenges • Support model is generally not appropriate ◦
Difficult access to engineering in the early days / validations steps ◦ Access to spare parts through level of support • Access to parts ◦ Extend the servers life makes spare parts availability more challenging ◦ Maintain many chassis types implies many spare parts SKUs ▪ No compatibility across manufacturers and across chassis ▪ Complete systems do not help to limit the number of references The Maintainability perspective

24 Challenges • Maintain the required in-house tools to control
and configure the HW ◦ -> tools must be agnostic to the HW flavor • Complete systems implies no control on the disks and DIMM ◦ For short and long term support, this is not helpful for maintenance • "Packaging" is not ideal for sustainability ◦ Too many palets / too much waste ◦ Shipment costs The Maintainability perspective

25 Solutions The Maintainability perspective • Open BMC ◦ Native
Redfish implementation ◦ Allow adding new features to address own use-cases ◦ Helps to optimize and standardize the interface/ interaction with FW ▪ Requires in-house skills ▪ Helps with internal HW solution ◦ Pending questions ▪ Source code delivery and RMA (rollback) ? ▪ Adoption and compliance ? ▪ Transfer of ownership / signature ?

26 Solutions The Maintainability perspective • Experience Center ◦ https://eu.osfci.tech/ci/
• M - CRPS ◦ Example of how to reduce the number of PSU references • bulk packaging (no cardboard, cords, DVD, guides …)

27 Solutions The Maintainability perspective • Adapt the service model
to be more appropriate ◦ Engineering access in the early days ◦ Spare parts available for a more appropriate period ◦ During “RUN” time, skip the 1st level of support to get access to spare parts ◦ L6 / diskless + brackets ▪ Helps to control the spare parts stocks and references ◦ customization / de-feature -> Less components, less failure

29 Thank you!

PremDay - Challenges and solutions to operate a...

PremDay - Challenges and solutions to operate a large and diverse on-prem hardware range

More Decks by PremDay

Other Decks in Technology

Featured

Transcript

Prem’Day 2024 Stéphane DUTILLEUL May 16th 2024 Challenges and solutions

01 Scaleway at a Glance 02 The Form Factor perspective

4 Part 01 Scaleway at a glance

5 Who is Scaleway ? Scaleway at a glance •

6 Who is Scaleway ? Scaleway at a glance •

7 Who is Stéphane ? Scaleway at a glance •

8 Ecosystem, portfolio Bare Metal Public Cloud AI Scaleway at

9 On premises installed base Scaleway at a glance •

10 On premises installed base Scaleway at a glance •

11 Part 02 The Form Factor perspective

12 Challenges The Form Factor perspective • Many products serving

13 Challenges The Form Factor perspective • Obvious impact on

14 Solutions The Form Factor perspective • Front IO ->

15 Solutions The Form Factor perspective • DC - MHS

16 Part 03 The Density perspective

17 Challenges The Density perspective • Various products -> density

18 Challenges The Density perspective • Dual socket MB is

19 Solutions The Density perspective • Always define the trade-off

20 Solutions The Density perspective • More and more cores

21 Part 04 The Maintainability perspective

22 Challenges • FW management ◦ Outdated IPMI interface ◦

23 Challenges • Support model is generally not appropriate ◦

24 Challenges • Maintain the required in-house tools to control

25 Solutions The Maintainability perspective • Open BMC ◦ Native

26 Solutions The Maintainability perspective • Experience Center ◦ https://eu.osfci.tech/ci/

27 Solutions The Maintainability perspective • Adapt the service model

Q&A

29 Thank you!