(Presented at DevOpsDays 2026)
After a system goes live, “Day 2” operations pose two big challenges. The first is rapid, minute-by-minute firefighting; the second, and often tougher, is writing a thorough post-incident Root Cause Analysis (RCA).
Under tight SLAs, we spend most of our energy putting out fires, only to burn even more time afterward reconstructing events, stitching together logs, and drafting the report.
This talk skips the hype of fully autonomous AI operations and instead focuses on a pragmatic approach: using the Model Context Protocol to connect your ops tools so an LLM can ingest real-world data and help engineers generate a structured RCA in minutes.
Let AI handle the drudgery of data gathering and formatting, while we invest our precious time in fixing issues and improving the architecture.
---
(分享於 DevOpsDays 2026)
系統上線後的 Day 2 維運挑戰,除了分秒必爭的故障排除,另一個大魔王往往是災難後的 Root Cause Analysis (RCA) 撰寫。
在高壓的 SLA 要求下,我們常忙於救火,卻在事後為了還原現場、統整 Log 與撰寫檢討報告而耗費大量心力。
本次分享將不談誇大的 AI 全自動維運,而是聚焦於務實的應用:如何利用 MCP (Model Context Protocol) 串接維運工具,讓 LLM 能夠讀取真實情境數據,協助工程師快速生成結構化的 RCA 報告。
讓我們把整理資訊的繁瑣工作交給 AI,把寶貴的時間留給解決問題與架構優化。