Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Premday #3 - Adopting Direct Liquid Cooling (DL...

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Premday #3 - Adopting Direct Liquid Cooling (DLC) in production

Criteo & BNP Paribas share their experience in using Direct Liquid Cooling (DLC) in production.

Avatar for Premday

Premday

June 08, 2026

More Decks by Premday

Other Decks in Technology

Transcript

  1. Who are we ? • Bank, Insurance & Finance •

    25k+ servers (+ private externally managed cloud) in 200+ colocations and 6 owned Datacenters • DLC deployment : ▪ Fully operational in 4 HPC sites ▪ Under study in BNPP owned London site and being deployed in first FR BNPP owned site by Apr. 2027; second one by 2029 ▪ All new sites are contractually DLC ready for 3 years (RFP requirement) • Adtech business • 30k servers, 10 datacenters • DLC MVP running for over a year ▪ Goal: Is DLC production ready? ▪ 8 racks ▪ Servers: Base nodes, GPUs, Fast IO • New production DLC datacenter in Japan opening in 2026 2 PremDay 2026
  2. Context • Deploying a DLC infrastructure based on very uniform,

    high density and standardized hardware is less complex but only covers a marginal IT scope's part • After taking quick wins, it's now time to shift from a 'never touched monolithic DLC clusters for HPC (easy one)' to a more heterogenous hardware deployment covering a wider scope approach • With potential breakers : colocations hosting legacy contracts, not initially designed for DLC In other words, how CSP can convert their on-going operations to DLC servers and racks 3 PremDay 2026
  3. Why DLC ? Market trend • TDP and core count

    • Density is an enabler for better efficiency Challenges • Densification increases cooling requirements • DLC becomes a requirement Opportunity • DLC enables a better PUE and translates to savings • Better cooling should also lead to less failures 4 PremDay 2026
  4. DLC in colocation State of the market • DLC is

    mature for monolithic HPC deployments • Colocations are building their first gen DLC rooms Challenges • Our rooms see several generations of hardware from various providers • How do we convert our operations from air-cooling to DLC ? • Is DLC ready for heterogenous scale-out deployments ? • How & can different Hardware & DLC infrastructure suppliers coexist ? 5 PremDay 2026
  5. DLC Servers High density • High core count and high

    memory • High TDP. Target around 1.5kW per rack unit Mix of air/liquid cooling • Processors and sometimes DIMM are liquid cooled • Water loop is entirely passive • Often, as many fans as the air-cooled version • Target: 60-80% of heat in the water, 20-40% in the air The air/liquid cooling ratio is a key design point 6 PremDay 2026
  6. DLC Racks High density • 40x DLC servers • Target

    around 50-100kW per rack Liquid cooling • PG25 coolant • Manifolds • Cooling Distribution Unit • Rear Door Heat exchanger (option) • Need a plumber when installing the rack 7 PremDay 2026
  7. DLC Room/Cage Colocation • The room is owned by the

    facility • Today, mostly retrofitted air-cooled designs for DLC Mix of air/liquid cooling • Facility provides a redundant pair of End of Row CDU • Around 1MW of cooling capacity each • A pre-defined air-cooling capacity for the room • Most of retrofitted cages rely on Datacenter chilled loop and not on a tempered loop (> higher PUE) The air/liquid cooling ratio is contractually fixed for the lifetime of the room 10 PremDay 2026
  8. SLA Challenges Status • DLC is not a plug-and-play solution

    • Circuit & CDU Maintenance is delegated to Datacenter services suppliers (short training) • No previous relations between hardware vendors and datacenter operators • SLA & contract terms have to be discussed for each deployment • CDU (in row and in rack) monitoring & management has to be integrated into Datacenter services supplier BMS and/or in our Monitoring/Alerting systems Expectation • Installing a DLC rack should be as easy as an air-cooled one 11 PremDay 2026
  9. Responsibilities... Status • No one wants to share coolant (secondary)

    loops • Unable to mix hardware from various vendors in the same rack • Even mixing different generations from the same vendor is not always possible • Hardware Suppliers request in-racks CDUs to define a delimitation point (implies a rack level granularity and a 3% efficiency loss) Expectation • There should be greater standardization • It should be possible to mix hardware on the same coolant loop; sharing coolant should be the same as sharing chilled air 12 PremDay 2026
  10. Physical Setup Plumbing • A lot of upfront alignment •

    Each site is a bit custom with different pipe fitting and connectors norms • Half a day of work per rack Expectation • Adding a rack should be plug and play 13 PremDay 2026
  11. Thermal testing Automated testing • Orchestrated using hwbench • Tons

    of graphs in various scenarios Cooling behavior • LC components always remain at low temperature (<50°C) • Fans stays in idle mode (<20%) in nominal conditions • VR & Rear-IO modules are hot (+80°C) 14 PremDay 2026
  12. New failure modes CDU failures • End-of-row CDUs are redundant

    • In-rack CDU are only partially redundant • How are servers notified of a CDU failure? Loss of an in-rack CDU • CPU throttling but alive because of thermal inertia • Fans power consumption increases at lot • Power peaks once the coolant returns 15 PremDay 2026
  13. Monitoring the CDUs Status • CDU operability is not good

    • Local and remote UI are not very useful • No modern remote monitoring protocol, only SNMP • We need a modern redfish implementation • Embedded controller is full of security holes Expectation • Parity with full featured BMC • Ideally, open-source (OpenBMC for CDU) 16 PremDay 2026
  14. Support & Repair Lifecycle operation • Non standardized practices regarding

    coolant QA • What is expected after initial maintenance/support 5y-contract ? Certification • Only certified personal from hardware vendors can perform repairs • Even to replace simple components in a server • Very disruptive to our repair operations (contractor based) Expectation • There should be no difference between the service model of air-cooling and DLC servers 17 PremDay 2026
  15. Density is challenging in itself Hardware • DLC is mostly

    available for servers, not for lower density devices • When will network devices & storage arrays be compliant with DLC infrastructure so that setup is globally uniform? Performance optimization • Deep and highly non-uniform memory hierarchy • Known issues but taking a bigger scale SLO & SLAs • Higher density means re-thinking our SLO/SLAs • How do we manage loosing a rack at that scale? 18 PremDay 2026
  16. Takeaways • Deploying a DLC infrastructure in colocation is sometimes

    painful But ... DLC is the near-future of cooling • Elements to consider : compatibility between miscellaneous Hardware, CDU and piping standards 19 PremDay 2026 3 main pain points could be fixed : • DLC piping and CDU connectivity : norms and/or industry standard arising • Hardware suppliers could agree on sharing same secondary coolant circuit • DLC products line should extend to lower density devices (Network, Storage arrays, Appliances …)