sites 2014 1st Bunker DC 2021 1st DC Germany –250kW 2024 1st DC Norway -400kW 2026 3 regions –1MW 100% green Energy PUE < 1.2 2017 1st Private Cage –50kW
data loss Wrong PSU sizing → fire Customs & logistics hell Loading dock too small Racks tipped over Supply chain disaster Wrong disk swapped Global power outage ...also: rack flooded from above · NIC overheating to 100°C · CPU throttling · Dead on Arrival ˜1% · Security incidents Firmware-upgrade campaigns
100 TB to 500 PB — the phased approached for 99.999999999% (11 nines) durability 2014 —2018 MySQL 100 TB Emails & attachments stored in MySQL. Humble start. 2018 —2024 Ceph on bare metal 1PB → 30+ PB per cluster Multiblob to kill 192×overhead on kB blobs. EC 4+2 → 8+3 once 3,000 OSDs × 1% = 12-day MTBF. 12 to 24 HDD machinces 2024 —today Ceph on K8s 300 PB raw 3 regions. Operations at scale, on Kubernetes since end-2024. Reach 65PB per cluster limit 2025 —today Custom cold tier +200PB 50% cheaper than Ceph Low EC over 3 regions Ceph for hot blob (<3 mo). Older data → tiering on higher TB per CPU/RAM platform NEXT What’s next Ceph full flash · TiKVfrom kB blobs · HDD JBOD + SSD buffer · power-optimized drives · SMR · tape-to-glacier We optimized in 3 dimensions Less Raw TB / User TB –Lower $/TB –Lower Watt/TB
What it actually costs us 50–60% hardware — 25–40% electricity Both grow more expensive over time. Buying more is not a strategy. Workload-shaped • kB blobs, billions per cluster. 192×overhead off-the-shelf —we engineer around it (multiblob, OMAP). • ceph-ansible → cephadm → rook SLAs that drive every choice • N+1 metro · survive any DC + 2 machines/site – 11nines durability • p99 50 ms hot data (<3 mo) • p99 100 ms for all data What we ask vendors now Density — TB / U, JBOD-friendly, toolless design Power per TB — embrace SMR, low-power profiles, cooling efficiency Lifecycle data — we plan around it Co-design — tiered systems: HDD JBOD + SSD buffer + tape Five years ago we asked for more drives. Today we ask vendors to design with us.
you grow 100× Average 3% DOA rate $1.5M of Dead HW at current scale 5 days to 3 months to fix a DOA part! RMA pipelines aren’t built for our scale — or our urgency. Hidden cost per failed unit Hardware immobilized · OPEX overhead · rack space wasted Cascade on deployment timelines Burn-in 48 h · provisioning hrs · cluster-add hrs · one fail → re-batch → k$ loss This increases my TCO. I bake it into every order!
humans handle it. At scale for velocity and cost-efficiency, Standardization + Automation Where we came from 50 SKUs for a team of 10 Startup mode: build fast, ship to market. Optimized for flexibility and cost Humans absorbed the heterogeneity. What scale forced Fewer SKUs — dual vendors per platform Automation everywhere Inventory · burn-in · failed-unit registry · provisioning Tight integration Facilities compute data — not siloed Standardization isn’t ideology —it’s what makes scaling possible through automation.
quality . Lifecycle visibility One workload fits all Designed for the median. kB blobs & large physics files miss the spec. Procurement, not partnership Vendors sell SKUs. We want to co-design hardware. We’re not asking for a different vendor. We’re asking for a different conversation. Firmware validation→ a major manual workload OEM “latest and best” firmware reduces testing flexibility Known firmware issues create recurring engineering overhead From 5-7 years to 7-10 years
legacy solution OPEX −20% Energy efficiency · tool-less L11 maintenance Technology leadership + DLC Faster next-gen CPU onboarding · Direct Liquid Cooling (another −20% OPEX) · OpenBMC standardization Supply chain control Direct Direct procurement, multi-trusted manufacturers Component-level partnerships for strategic parts Two infrastructures. One open specification. Meet us at OCP.