3. Demonstration 4. Outcome Evaluation 5. Lesson Learnt 6. Issues & Future Work AIでIT運⽤の効率化をするにはどうすればよいの? What we should do for IT operation enhancement with AI 経営者からの「AIで業務改善せよ」に応えるためはどうすればよいの? What we should do for “AIʼs Enhancement” from CEO Open Question 4
plane Cloud Control plane Grafana Alert manager PromQL API (on TSDB) Slack PagerDuty OpenSearch DSL LogQL API Metrics PF Log PF OpenSearch Dashboard Me while on-call Cloud Data plane Check system metrics via Grafanaʼs visualization Login to check raw system status (like docker ps, df -h, etc..) Confirm dataplane status (like hypervisor down by hardware issue) Check entity relation (Virtual Router A3 is running on Hypervisor D14) Get a phone call when some error or critical issue is occurred and operatorʼs handling is needed Call PromQL raw query to investigate more detailed information related to the incident System status check api System status cache api Other ops PF Operation Run Books 6
Zone -c State -c "Updated At" +---------+-------+----------------------------+ | Zone | State | Updated At | +---------+-------+----------------------------+ | kks-az2 | down | 2026-02-13T20:00:00.000000 | +---------+-------+----------------------------+ $ ping hv00000-xxxx.xxx-az2.xxxx.xxx PING hv00000-xxxx.xxx-az2.xxxx.xxx (10.0.0.1): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 Request timeout for icmp_seq 2 i.e. Private Cloud operation (1) Hypervisor Down 8
Incident start → Incident identified → Incident recovered → MTTI improvement MTTR improvement step2 2025.03.01 step3 2025.04.01 Only approve Including Human command execution MTTR improvement • Other Important Key Metrics ◦ Number of incidents (100 times x 10 min-TTR-improve == 16 h saved) ◦ Coverage & Works for AI Awareness ▪ ROI: “With AI with toooooo much AI-nize time” <<< “No AI”