Presented at IBM TechXchange Summit Japan 2025 (December 3, 2025) by Daisuke Hiraoka (IBM Japan) and Shoichiro Sakaigawa (Getworks Inc.).
This session covers a proof-of-concept (PoC) deployment of GPU observability in a container-style data center located in Yuzawa, Niigata, Japan. Using NVIDIA DCGM Exporter and OpenTelemetry Collector, the team achieved real-time monitoring of 8 NVIDIA H200 GPUs — resulting in approximately 80% reduction in GPU power consumption (5,520W → 1,062W) and a 35°C drop in GPU temperature through workload rebalancing.
The presentation also explores the facility's groundwater-assisted cooling infrastructure, renewable energy integration, and the roadmap toward PUE/WUE visibility and ESG reporting.
This PoC has since moved to production (January 2026). The extended pipeline — including SNMP-based cooling telemetry from in-row air coolers and liquid-cooling CDUs — is the subject of our KubeCon + CloudNativeCon Japan 2026 submission.
Speakers:
- Shoichiro Sakaigawa — IBM Champion, System Manager / AI Expert, Getworks Inc.
- Daisuke Hiraoka — IBM Champion, Advisory Automation Technical Specialist, IBM Japan