services For 3D rendering, CFD and other numerical simulation, banks, … Focused on Low carbon footprint and general resources efficiency In particular through waste heat reuse
heat reuse 5 5 We build chassis/racks Tailor-made to optimize heat reuse We install them In a distributed way, on small/medium sites, where heat can be used We use them for our HPC cloud Operating them remotely, over the Internet or private networks, to offer our cloud services We reuse the heat We sell heat to heat networks, for domestic applications, industries, swimming pools, …
financial simulation • • 3D animation • • biotechnology • • CFD • Certification and labels reduction in carbon footprint of our clients calculations on average 80% + than 3,000 Customers 2010 creation date 70 employees 2,1 of heat computer recovered in 2022 GWh 70K COMPUTING CORES MORE THAN
district or domestic heating networks Existing buildings Existing heating facilities CAPTURE VALUABLE HEAT WHERE IT’S NEEDED USING EXISTING INFRASTRUCTURE 10 Heavy focus on heat reuse | How it shapes what we do As hot as possible
what we do • Efficient: extract as much energy as possible, up to 95% efficiency • HOT: domestic hot water needs > 60°C for sanitary purposes • Operating on tap water, can be included in existing water circuits • Plumbing can be operated by a plumber Capture valuable heat Using existing infrastructures Custom rack / chassis design Custom cooling
what we do Where heat is needed Using existing infrastructures Widespread geo-distribution on many small and medium-sized sites Very few people need 1 GW of heat @ 65°C in one place Lots of people need 500 kW of heat @ 65°C
Some impacts from custom cooling Schematics, thermics, mechanics Some impacts from distribution All remote, secure We’re also (mostly) normal people Most needs are not specific to us
know Our cooling is based on cold-plates that replace vendor cooling systems 17 Not just CPU, but also smaller chips, NICs, … Removing fans, cooling blocks, and putting ours instead Know what needs to be cooled and by how much Stripping it down and replacing The hotter it is, the most value the heat has. High-temp components
18 Precise schematics Knowing where every chip that may need cooling is located, to the mm. Stable design No moving around elements between revisions. If unavoidable, we need to know Measuring is possible, but time-consuming / expensive Thermal requirements of everything How hot it can get, what power it dissipates Mechanical requirements related to cooling E.g: pressure between CPU and heat sink Early on Trial & error is possible, but time-consuming / expensive Finding errors late in the cycle is very expensive Knowing early on ⇒ faster time-to-prod for the HW ⇒ faster time-to-market for the cloud service ⇒ more informed decisions, so better TCO
apart should be easy OCP tool-less “touch the green” motto is WONDERFUL Better yet: we just want the motherboard, naked Warranty void if cooling is taken apart is a alost a no-go for us We don’t need a chassis, or heat sinks. Waste we’re not using ⇒ degrades environmental footprint Waste we have to dispose of ⇒ cost Human work to strip it ⇒ cost We WILL remove vendor cooling No warranty void!
We prever CPUs with a high TCase, as we can heat water hotter, enabling applications without a heat pump All other components too One single 1¢ component dying at 50°C can be a no-go for us. We prefer the 10¢ version that can take 90°C Warmer is more valuable IT’S NOT JUST US “Traditional” DCs benefit too: higher temperature also means easier free cooling, less AC, so low PUE without adiabatic cooling (water waste).
medium actors will have 10s of them • Big players will have 100s • Hyperscalers may have 1000s 22 GEO DISTRIBUTION | Wide scale distribution of small units • We already have 10s • Soon we’ll have 100s • In 10 years, 1000s or maybe 10.000s IT’S NOT JUST US “Edge computing” is a trend that won’t disappear, independent from heat reuse. Small/medium scalers with 1.000s of edge DCs will be a normal thing.
intervention is not an option Even just one, not only at scale Pushing a button is very costly It must be fully remote, it must work, it must be forgiving BMC, BIOS, NICs, … firmware and config Firmware upgrades must be rock-solid
sites Physical security not worth the cost Using existing facilities Physical security not always possible to retrofit Using heat facilities Sometimes no choice but to share access with “heating” personnel Varying levels of physical security Across sites Need to compensate with logical security For some customers and some usages, it will be acceptable if done properly. Creates manageability challenges
to deport the trust decision from the machine. Don’t want to deploy crypto/PKI everywhere. Something like TPM remote attestation With an implementation and tooling making it manageable at scale, zero-touch, reset-able, forgiving. Stable It must not change randomly, not change if we reflash the same FW. Predictible PCR values should be computable offline. Given firmwares, conf, and physically plugged HW, we should be able to know the PCR values even without looking at the real thing. Build new FW ⇒ compute new PCR values ⇒ flash FW ⇒ check PCR values
Even though “traditional” DCs have a more consistent physical security scheme, an additional layer of logical security is still desired. In any case, it must be manageable at scale.
We do a bit of unusual stuff: • We strip down servers • Custom cooling at machine level • Integrate with non-usual (for DC) cooling/heating networks • Operate in non-usual (for a DC) and “non-consistent” places At the end of the day, we’re just a cloud provider operating edge DCs Other on-prem actors do all or part of the “weird” stuff that we do. We have 90% the same challenges: make the servers work, and make them work at scale.
• We hit complex problems • We know how to provide valuable bug reports, minimal reproduction, make experiments, … • We did try to turn it off and on again ON-PREM OPERATOR | Make the servers work Mostly important during bringup. Can be useful after firmware upgrades.
code ON-PREM OPERATOR | Make the servers work If firmwares are open source, with a usable build+flash+debug toolchain, and hardware spec easily accessible and understandable, we can fix it ourselves. Even more so if based on Linux, which we’re familiar with. OpenBMC, Coreboot, OSS firmwares And the tooling
a diversity of hardware ON-PREM OPERATOR | Make the servers work For manageability, we need to bring back unity as low in the stack as possible. Redfish is not enough: too much diversity. OpenBMC, Coreboot, OSS firmwares Modular: swappable BMC modules 😍 PXE-bootable BMCs 😍😍
Make the servers work AT SCALE • The HTTP API is the ubiquitous building block • We can use it to integrate with off-the-shelf tools, or homemade tools (inventory, deployment, …) ⇒ If management interfaces are API-first, we’ll like it
deploy software at scale ON-PREM OPERATOR | Make the servers work AT SCALE Firmware should be no different: • open source with tooling, so we can customize it, build it, package it • Tooling and deployment APIs so we can deploy it at scale with usual deploy strategies (blue/green, canary, rolling, …) CI/CD for firmware. Servers fleet management is a devops job, should use devops tools and flows.
work AT SCALE Our target for keeping hardware is 10 years, not 3. So: • long-term firmware upgrades (hint: if OSS, we may maintain it ourselves, alone or with others) • upgradable crypto: for crypto chips, BMC auth, …
ON-PREM OPERATOR | Make the servers work AT SCALE We’re ready to pay a bit more for the hardware if we save on integration costs, day-to-day and long-term operational costs, increased lifetime, …
think about ON-PREM OPERATOR | Make the servers work in the future And tomorrow, we’ll want things even we didn’t think about. Being able to fork and modify the firmware is an insurance on the future. OpenBMC, Coreboot, OSS firmwares And the tooling
Cloud platform team Lead of the IT team Lead HW Eng Racks/Chassis design team Victor Charles Yoann SW Eng Cloud platform team Alexis Let’s talk! Contacts [email protected] Yoann [email protected] Clément - CTO