Slide 1

Slide 1 text

AIOpsݚڀ࿥ʕSREͷͨΊͷ 
 γεςϜো֐ͷࣗಈݪҼ਍அ yuuk1 @͘͞ΒΠϯλʔωοτݚڀॴ 
 2022/05/15 SRE NEXT 2022 ONLINE

Slide 2

Slide 2 text

2 ϓϩϑΟʔϧ yuuk1 (Yuuki Tsubouchi) ͘͞ΒΠϯλʔωοτݚڀॴɹݚڀһ ژ౎େֶେֶӃ৘ใֶݚڀՊɹത࢜ޙظ՝ఔ3೥ TopotalɹςΫϊϩδΞυόΠβʔ ৽ଔ͔Β5೥ؒWebΦϖϨʔγϣϯɾSREͷΤϯδχΞ https://yuuk.io/ 3೥લΑΓ͘͞ΒΠϯλʔωοτʹస৬͠ɺݚڀ։ൃͷੈք΁ 2೥લʹେֶӃത࢜ޙظ՝ఔʹೖֶ SRE NEXT 2020 IN TOKYO جௐߨԋ @yuuk1t

Slide 3

Slide 3 text

AIOpsͷݚڀ ΢ΣϒΦϖϨʔγϣϯɾSRE 2013 2018 2020ʙ ӡ༻σʔλͷޮ཰తͳ؍ଌ๏ͷݚڀ SRE DIVERSITY [Y.Tsubouchi 2021], [Y. Tsubouchi 2022] [Y. Tsubouchi 2021]: ௶಺༎थ, ࿬ࡔேਓ, ᖛా݈, দ໦խ޾, খྛོߒ, Ѩ෦ത, দຊ྄հ, HeteroTSDB: ҟछ෼ࢄKVSؒͷࣗಈ֊૚ԽʹΑΔߴੑೳͳ࣌ܥྻσʔ λϕʔε, ৘ใॲཧֶձ࿦จࢽ, Vol.62, No.3, pp.818-828, 2021೥3݄. [Y. Tsubouchi 2022]: Yuuki Tsubouchi, Masayoshi Furukawa, Ryosuke Matsumoto, Low Overhead TCP/UDP Socket-based Tracing for Discovering Network Services Dependencies, Journal of Information Processing, Vol.30, pp.260-268, 2022.

Slide 4

Slide 4 text

AIOpsͷ֤ݚڀͷମܥత੔ཧΑΓ͸Ή͠Ζɺσʔλα ΠΤϯεͷܦݧ͕ͳ͍தɺݚڀͷݱ৔Ͱૺ۰͖ͯͨ͠ ໰୊΁ͷࢼߦࡨޡΛ͓࿩͠·͢ɻ ιϑτ΢ΣΞ։ൃɾӡ༻ͷݱ৔Ͱ໌೔͔Β࢖͑Δ஌ࣝ΍ ςΫχοΫ͸ఏڙͰ͖·ͤΜ͕ɺ΄Μͷগ͠ઌͷະདྷΛ ײ͡ΔΑ͏ͳͳʹ͔Λ࣋ͪؼ͍͚ͬͯͨͩΕ͹޾͍Ͱ͢ɻ

Slide 5

Slide 5 text

1. SREͱAI

Slide 6

Slide 6 text

6 HAL 9000 ʰ2001 ೥Ӊ஦ͷཱྀʱ 18ষ SREͷͨΊͷػցֶशೖ໳ ͔ΒͷҾ༻ ” ͨͬͨࠓɺAE35Ϣχοτͷো֐Λݕग़͠·ͨ͠ɻ ࢲ͸72࣌ؒҎ಺ʹ100%ͷ֬཰Ͱػೳఀࢭ͠·͢ɻ” ― HAL 9000ɺʰ2001 ೥Ӊ஦ͷཱྀʱ “͜ͷөը͕ඳ͘ະདྷΛઌݟͷ໌Λ΋ͬͯߏ૝ͨ͠ͷ͸Ξʔ αʔɾCɾΫϥʔΫ(Arthur C. Clarke)ͰɺγεςϜͱϋʔυ΢Σ Ξͷো֐ൃੜΛԿ࣌ؒ΋લʹ༧ଌͰ͖Δ׬શࣗಈԽαʔϏεͱ AI Λ૊Έ߹Θͤ·ͨ͠ɻHAL 9000 ͸ɺཱࣗͨࣗ͠ݾௐ੔ܕͷ ܽ఺͕ͳ͍ػցͱ͍͏ਓྨͷເ(͋Δ͍͸ѱເ)Ͱ͋Γɺਓؒʹ Αͬͯఆٛ͞Εͨ໨ඪΛୡ੒͢ΔͨΊʹɺӉ஦ધͷ৐һͱϛο γϣϯͷ྆ํʹไ࢓͠·͢ɻ”

Slide 7

Slide 7 text

7 ɾ1980೥୅ʹ͸ɺωοτϫʔΫ؅ཧʹɺ஌ࣝϕʔεAI΍χϡʔϥϧωοτ ϕʔεAIΛԠ༻͢ΔՄೳੑ͕ٞ࿦͞Ε͍ͯΔ ৘ใγεςϜͷӡ༻ʹAIΛԠ༻͢ΔىݯΛ୳Δ [Cebulka 1989]: Cebulka KD, et al., Applications of arti fi cial intelligence for meeting network management challenges in the 1990s, IEEE GLOBECOM 1989. ɾಛఆͷαʔϏεΛαϙʔτ͢ΔͨΊͷωοτϫʔΫͷॳظઃܭ ɾηϯτϥϧΦϑΟεؒͷઓज़తͳઃඋܭը ɾεΠον͔Βͷϝοηʔδͷ؂ࢹͱ਍அ [Notaro 2021]: Notaro P, et al., A Survey of AIOps Methods for Failure Management. ACM TIST, 2021. ɾ1990೥୅ॳ಄͔ΒΦϯϥΠϯͷιϑτ΢ΣΞ΍ϋʔυ΢ΣΞͷނো༧஌ Ϟσϧ͕͍͔ͭ͘ఏҊ͞Ε͍ͯΔɽͦͷଞͷނো๷ࢭํ๏ͳͲ΋ಉ࣌ظ [Cebulka 1989] [Notaro 2021]

Slide 8

Slide 8 text

8 ݱ୅ʹ͓͚ΔAIOpsͷߩݙྖҬ [Notaro ’20]: Notaro, P, Jorge C, and Michael G. "A Systematic Mapping Study in AIOps.” ICSOC. Springer, Cham, 2020. [Notaro ’20]: Fig.2 Taxonomy of AIOps as observed in the identified contributions 
 ΑΓసࡌ ো֐؅ཧʹؔ͢Δݚڀ Ϧιʔεͷׂ౰ͳͲͷ 
 ࠷దԽʹؔ͢Δݚڀ

Slide 9

Slide 9 text

9 AIOpsͷݚڀྖҬ͝ͱͷ࿦จ਺ [Notaro ’20]: Notaro, P, Jorge C, and Michael G. "A Systematic Mapping Study in AIOps.” ICSOC. Springer, Cham, 2020. ɾAIOpsؔ࿈ͷ࿦จ਺ɿ670 ɾ670݅ͷ62.1%͕Failure Managementʢো֐؅ཧʣʹؔ࿈͍ͯ͠Δ ɾো֐༧ଌʢ26.4ˋʣো֐ݕग़ʢ33.7ˋʣݪҼ෼ੳʢ26.7ˋʣ ࿦จ਺͸૿Ճ܏޲

Slide 10

Slide 10 text

10 γεςϜ؂ࢹͷ੒ख़͍ͯ͠Ε͹ɺো֐ͷൃੜ ʹؾ͔ͮͳ͍͜ͱ͸ͳ͍ͷͰ͸ͳ͍͔ʁ AIOpsͷͲͷྖҬʹऔΓ૊Ή͔ʁ ো֐ͷݪҼ͸ͳʹ͔ʁʹ౴͑Δ΄͏͕೉͍͠ ༧ଌ/༧๷ ݪҼ਍அ ؇࿨ ࠜຊݪҼ෼ੳ ݕ஌ Failure Managementʢো֐؅ཧʣͷ΄͏͕৴པੑʹ௚݁ म෮ AIOpsͰ 
 ࠷ॳʹ࿈૝ ?

Slide 11

Slide 11 text

2. γεςϜো֐ͷݪҼ਍அΛ 
 ࣗಈԽ͢ΔͨΊͷߏ૝

Slide 12

Slide 12 text

12 ΞϥʔτετʔϜ γεςϜো֐ݕ஌ޙͷ՝୊ େྔͷϝτϦΫεͷӾཡ CRITICAL: front-end - http_request latency_95 CRITICAL: user - latency_95 CRITICAL: user-db_memory_usage CRITICAL: user-db_cpu_user_usage CRITICAL: orders - jvm_heap_memory_usage CRITICAL: orders-db - network_transmit_bytes CRITICAL: payment - http_request_error_5xx CRITICAL: front-end - http_request_latency_50 CRITICAL: front-end - http_request_latency_90 CRITICAL: user-db - cpu_system_usage ೝ஌ෛՙ 
 ૿େ

Slide 13

Slide 13 text

13 ঱ঢ়ɿͳʹ͕յΕͨͷ͔ʁ ݪҼɿͳͥյΕͨͷ͔ʁ ͳͥΞϥʔτετʔϜ͕ൃੜ͢Δͷ͔ʁ ঱ঢ়ͱݪҼʹ͸૬ରؔ܎͕͋ΔͨΊɺ۠ผ͢Δ͜ͱ͕࣮͸೉͍͠ ঱ঢ়ͱݪҼ͕ಉ࣌ʹΞϥʔτ͞ΕΔͨΊ } ঱ঢ় ݪҼ 1 HTTP 500΋͘͠͸400͕ฦ͞Ε͍ͯΔ σʔλϕʔεαʔό͕઀ଓΛڋ൱͍ͯ͠Δ 2 σʔλϕʔεαʔό͕઀ଓΛڋ൱͍ͯ͠Δ σʔλϕʔεαʔόͷ σΟεΫ࢖༻ྔ͕ຬഋͱͳ͍ͬͯΔ 3 σʔλϕʔεαʔόͷ σΟεΫ࢖༻ྔ͕ຬഋͱͳ͍ͬͯΔ ΫΤϦϩάͷϑΝΠϧαΠζ͕ٸ଎ʹ૿Ճ 4 ΫΤϦϩάͷϑΝΠϧαΠζ͕ٸ଎ʹ૿Ճ … ঱ঢ়ͷΈΛΞϥʔτ͢Ε͹Α͍ͷͰ͸ʁ

Slide 14

Slide 14 text

14 “Alert symptoms, diagnose causes” SLOʹجͮ͘ΞϥʔςΟϯά [SRWbook 18] Chapter 5 "Alerting on SLOs", Beyer B, et al., The Site Reliability Workbook: Practical ways to implement SRE. O'Reilly Media, Inc."; 2018. ※ [SRWbook 18] Four Golden Signals / RED Latency, Traffic, Errors, Saturation Rate, Errors, Duration αʔϏεશମͷ঱ঢ়ʹର͢ΔΞϥʔτ AIʹΑΔݪҼ਍அ τϦΨʔ ӡ༻σʔλͷࣗಈղੳ ӡ༻σʔλ ϝτϦΫε ϩά τϨʔε Πϕϯτ SREs

Slide 15

Slide 15 text

15 ྨࣅͷண૝Λ΋ͭઌਓ͸͍Δ Chen P, et al., Causeinfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems. IEEE INFOCOM 2014. CauseInfer (2014) ɾίϯϙʔωϯτ͝ͱʹͷύέοτͷண৴࣌ ࠁ͔Βਪఆ͢ΔTCP஗ԆΛܭଌ ɾ౷ܭతҼՌ୳ࡧ෼໺ͷPCΞϧΰϦζϜʢޙ ड़ʣʹΑΓҼՌάϥϑੜ੒ SLOϝτϦΫεͷҧ൓ݕ஌ΛτϦΨʔͱͯ͠ҼՌਪ࿦ CauseInfer (2014) Fig. 2.ΑΓҰ෦సࡌ Lin J, et al., Microscope: Pinpoint performance issues with causal graphs in micro-service environments. ICSOC, 2018. Microscope (2018) CauseInferʹରͯ͠ɺඇ௨৴ؔ܎ͷґଘΛߟྀ ɾCaudeInferಉ༷ʹҼՌάϥϑΛੜ੒ ɾϚΠΫϩαʔϏεͰ͸ɺ֤αʔϏεͰ Ԡ౴͕࣌ؒܭଌ͞ΕΔ Microscope (2018) Fig. 2. ΑΓసࡌ

Slide 16

Slide 16 text

ҼՌάϥϑͷܦ࿏ͷϥϯΩϯάग़ྗ ɾAutoMAP (2020): ϚΠΫϩαʔϏε୯ҐͰԠ౴͚࣌ؒͩͰͳ͘7छྨͷϝ τϦΫε͔ΒಘͨݸผͷҼՌάϥϑΛ߹੒ ɾFluxInfer (2020): PCΞϧΰϦζϜͰ͸ͳ͘ɺॏΈ෇͖ແ޲ґଘάϥϑ + ϖʔδϥϯΫ ɾMicroCause (2020): PCΞϧΰϦζϜͷ࣌ܥྻͷϥάΛߟྀ͢Δվྑ 16 ྨࣅͷண૝Λ΋ͭઌਓ͸͍Δ ʔ ൃలฤ ݪҼϝτϦΫεͷϥϯΩϯάग़ྗ ɾPatternMatcher (2021): ϝτϦΫεͷҟৗύλʔϯΛCNNͰ෼ྨ ɾFluxRank (2019): ྨࣅͷ࣌ܥྻΛΫϥελϦϯάޙʹϩδεςΟοΫճؼ ͰϥϯΩϯά

Slide 17

Slide 17 text

17 ݪҼ਍அ࿦จͷϦετ https://github.com/dreamhomes/RCAPapers ͞ΒͳΔઌߦݚڀͨͪ Notaro P, Cardoso J, Gerndt M. A Survey of AIOps Methods for Failure Management. ACM Transactions on Intelligent Systems and Technology (TIST). 2021 Nov 30;12(6):1-45. Lyu Y, Rajbahadur GK, Lin D, Chen B, Jiang ZM. Towards a Consistent Interpretation of AIOps Models. ACM Transactions on Software Engineering and Methodology (TOSEM). 2021 Nov 15;31(1):1-38. Soldani J, Brogi A. Anomaly detection and failure root cause analysis in (micro) service-based cloud applications: A survey. ACM Computing Surveys (CSUR). 2022 Feb 3;55(3):1-39. https://blog.yuuk.io/entry/2020/ieeecloud2020 https://netman.aiops.org/publications/ ਗ਼՚େֶ NETMAN LAB ࠃࡍձٞ IEEE CLOUD AIOpsؔ࿈ͷαʔϕΠ࿦จ

Slide 18

Slide 18 text

18 ɾ՝୊̍ ೖྗͱͳΔϝτϦΫεͷछྨ ΍ݸ਺Λ༧Ίࢦఆ͢Δඞཁ͕͋Δɻ ɾ՝୊̎ ࢦఆͨ͠ม਺ʹରͯ͠ݸผʹ ద੾ͳؔ਺ΛબΜͩΓɺνϡʔχϯ ά͢Δඞཁ͕͋Δɻ ɾ՝୊̏ ਪఆ݁Ռͷઆ໌ੑͷͨΊʹ͸ ϝτϦΫεͷΈͰ͸ෆ଍͢Δɻ ઌߦݚڀͰͷະղܾͳྖҬΛ୳Δ ɾେྔͷϝτϦΫεΛऩूͰ͖Δ Α͏ʹͳͬͨ ɾʹ΋ؔΘΒͣɺͦΕΒͷϝτϦ ΫεΛ׆༻Ͱ͖͍ͯΔ͔ʁ જࡏత໰୊ҙࣝ γεςϜͰ؍ଌ͞Εͨશͯͷ 
 ϝτϦΫεΛೖྗͱͯ͠ 
 ҼՌάϥϑΛੜ੒

Slide 19

Slide 19 text

ҟछࠞ߹ϝτϦΫεʹରͯ͠ݸผʹԾఆΛஔ͔ͳ͍ 19 શϝτϦΫε͔ΒݪҼ਍அʹ͔͚Δͱॲཧ͕࣌ؒ௕͘ͳΔ ݚڀͷ໰୊ઃఆɿݪҼ਍அͷલॲཧ ো֐ 
 ݕ஌ ݪҼ਍அ ఏҊ ࣌ܥྻͷ 
 ݸ਺Λ࡟ݮ ௶಺༎थ΄͔, TSifter: ϚΠΫϩαʔϏεʹ͓͚Δੑೳҟৗͷਝ଎ͳ਍அʹ޲͍ͨ࣌ܥྻσʔλͷ࣍ݩ࡟ݮख๏, Πϯλʔωοτͱӡ༻ٕज़γϯϙδ΢Ϝ࿦จू, 2020೥. ௚ۙͷݻఆ෯ͷ 
 ϝτϦΫεΛऔಘ ਺෼୯Ґͷ࣮ߦ࣌ؒ ΦϑϥΠϯղੳ ϝτϦΫεΛࣄલࢦ ఆͤͣʹɺߴ଎ͳݪ Ҽ਍அ͕Մೳʹ ҼՌάϥϑ 
 ͷੜ੒

Slide 20

Slide 20 text

3. ࣌ܥྻղੳɾҼՌάϥϑੜ੒ 
 ͷͨΊͷࢼߦࡨޡ

Slide 21

Slide 21 text

21 ΦϖϨʔλʔͷೝ஌ॲཧͷྲྀΕΛτϨʔε͢Δ ᶃظؒ಺ͰҟৗΛؚΉ 
 ɹ࣌ܥྻΛൃݟ ᶄ࣌ܥྻάϥϑͷܗঢ় ͕ࣅ͍ͯΔ΋ͷΛ 
 άϧʔϓԽ ΫϥελϦϯά ΦϑϥΠϯ 
 ҟৗݕ஌ ϑΣʔζ̍ ϑΣʔζ̎

Slide 22

Slide 22 text

ϑΣʔζ̍ ΦϑϥΠϯ ҟৗݕ஌ ϑΣʔζ2 ܗঢ় 
 ΫϥελϦϯά ҼՌάϥϑͷ 
 ੜ੒ લॲཧ ݪҼ਍அ ҼՌάϥϑͷ 


Slide 23

Slide 23 text

23 ϑΣʔζ̍ɿ୯มྔ࣌ܥྻͷҟৗੑʹண໨ ࣌ܥྻͷҟৗύλʔϯΛ13ύλʔϯʹ෼ྨͨ͠ྫ [PatternMatcher 2021]: Wu C, Zhao N, Wang L, Yang X, Li S, Zhang M, Jin X, Wen X, Nie X, Zhang W, Sui K. Identifying Root-Cause Metrics for Incident Diagnosis in Online Service Systems. [PatternMatcher 2021] 30෼ఔ౓ͷ୹ظؒͷղੳͰΑ͍ͨΊɺقઅੑ΍ϦϦʔεʹΑΔਖ਼ৗ ϞʔυͷมԽΛߟྀ͠ͳͯ͘Α͍ɻ ͢Ͱʹো֐ݕग़͞ΕͨޙͳͷͰɺΦϑϥΠϯҟৗݕ஌ͰΑ͍

Slide 24

Slide 24 text

[PatternMatcher 2021]: Wu C, Zhao N, Wang L, Yang X, Li S, Zhang M, Jin X, Wen X, Nie X, Zhang W, Sui K. Identifying Root-Cause Metrics for Incident Diagnosis in Online Service Systems. 24 ඪຊXͱඪຊY͕ಉҰͷ฼ूஂͷ෼෍ΑΓੜ͍ͯ͡ Δ͔Λݕఆ͢Δ 2ඪຊؒͷ෼෍ͷࠩΛΈΔݕఆɿK-Sݕఆ ͏·͍͔͘ͳ͔ͬͨέʔε ɾγϣʔτεύΠΫͷΑ͏ͳݦஶͳ֎Ε஋ΛؚΉ ࣌ܥྻ p஋: 0.11 p஋: 0.51 ɾগ਺ͷ֎Ε஋Ͱ͸ɺ෼෍͕ҧ͏ͱΈͳ͞ΕΔ΄ ͲͰ͸ͳ͍ ࣌ܥྻΛ௨ৗظؒͱςετظؒʹ2෼ׂ͠ɺظؒؒ ͷ෼෍ࠩҟΛݕఆ ʢ[PatternMatcher 2021]Ͱ࠾༻͞Ε͍ͯΔʣ

Slide 25

Slide 25 text

25 ɾ௚؍ɿ࣌ܥྻͷಛੑ͕࣌ؒͷܦաͱͱ΋ʹมԽ͠ͳ͍ ɾఆٛɿܥྻͷฏۉ͓Αͼ෼ࢄ͕࣌ؒʹΑΒͣҰఆɼ͔ͭࣗݾڞ෼ࢄ͕࣌ؒ ࠩͷΈʹґଘ͢Δੑ࣭ ࣌ܥྻͷੑ࣭ʮఆৗੑʯͷ͋ͯ͸Ί ɾఆৗੑʹ͋ͯ͸·Βͳ͍ܥྻ͸ҟৗͱΈͳ͢ γϣʔτεύΠΫ 1֊ 
 ࠩ෼ ͏·͍͔͘ͳ͔ͬͨέʔε ADFݕఆͰݕఆՄೳ ɾγϣʔτεύΠΫ͕ 
 ఆৗͱ൑ఆ͞ΕΔ ɾADFݕఆ͸ʮࠩ෼ܥྻʯ ͕ఆৗੑͷੑ࣭Λຬͨ͢ ͔ݕఆ εύΠΫ͸ฏۉ΁ͱ 
 ճؼͯ͠͠·͏ ߨԋͰ͸εΩοϓ

Slide 26

Slide 26 text

26 ҟৗݕ஌ख๏ͱͯ͠޿͘஌ΒΕ͍ͯΔख๏Λ࢖͏ ౷ܭతҟৗݕ஌ͷجຊͷεςοϓ ֬཰෼෍Ͱਖ਼ৗύλʔϯΛදݱͰ͖ΔͱԾఆ 1. ෼෍ͷਪఆ 2. ҟৗ౓ͷࢉग़ 3. ᮢ஋ͷઃఆ ҟৗ౓ʹର͢Δᮢ஋ͷઃఆʹΑΓҟৗΛ൑ఆ ະ஌ύϥϝʔλΛؚΉ֬཰෼෍Λਖ਼ৗϞσϧͱͯ͠Ծఆ σʔλ͔Βະ஌ύϥϝʔλΛਪఆ ਖ਼ৗ͔ΒͷͣΕͷ౓߹͍Λ༧Ίఆٛͯ͠ࢉग़ ࢀߟɿҪख߶, ೖ໳ ػցֶशʹΑΔҟৗ ݕ஌, 7.3અ ίϩφࣾ, 2015..

Slide 27

Slide 27 text

27 ࣌ܥྻͷ֤఺ͷ෼෍͕ਖ਼ن෼෍ʹै͏ͱԾఆ ֎Ε஋ݕग़ɿϗςϦϯάͷ ๏ T2 ࢀߟɿҪख߶, ೖ໳ ػցֶशʹΑΔҟৗݕ஌, 7.3અ ίϩφࣾ, 2015.. ౓਺෼෍΁ 1. ෼෍ਪఆ f(x) = 1 2πσ2 exp ( − (x − μ)2 2σ2 ) ωοτϫʔΫૹ৴όΠτ਺ͷϝτϦΫε μ = 1 N N ∑ i=1 xi σ2 = 1 N N ∑ i=1 (xi − μ)2 2. ҟৗ౓ͷܭࢉ ඪຊฏۉ ඪຊ෼ࢄ a(x) = ( x − μ σ ) 2 3. ᮢ஋൑ఆ ҟৗ౓ͷ෼෍͸ࣗ༝౓1ͷΧΠೋ৐෼෍ʹै͏ ᮢ஋Λ֬཰ͰܾఆͰ͖Δʢ0.01, 0.05ͳͲʣ ϊΠζʹऑ͍ ະ஌ 
 ύϥϝʔλ

Slide 28

Slide 28 text

28 ɾࣗݾճؼϞσϧɿ͋Δ࣌ࠁ t ͷ஋Λɺ࣌ࠁ t Ҏલͷ஋Λ࢖ͬͯճؼ͢ΔϞσϧ ɾ௚؍ɿະདྷ͸աڈͷ஋͔Β༧ଌͰ͖ΔͱԾఆ ࣗݾճؼϞσϧʢARϞσϧʣʹΑΔҟৗݕ஌ ҟৗ౓ ࣮ଌ஋ʢ੨ʣ༧ଌ஋ʢᒵ৭ʣ yn = r ∑ t=1 at yn−t + vn ʢ༧ଌ஋ − ؍ଌ஋ = ༧ଌޡࠩʣ= ҟৗ౓ ࣌ܥྻ ͕༩͑ΒΕͨͱ͖ y1 , . . . yN : ϥά࣍਺ r : ࣗݾճؼ܎਺ at : ฏۉ0ɺ෼ࢄ ͷ 
 ਖ਼ن෼෍ʹै͏ϗϫΠτ ϊΠζ vn σ2 ɾ Λܾఆ͠ɺ࣮ଌ஋͔Β܎਺ ͱ෼ࢄ Λਪఆ͢Δ ɾ ͸੺஑৘ใྔج४ʢAICʣΑΓܾఆ͢Δͷ͕Ұൠత ɾ ͱ ͷۙࣅ஋Λ࠷খೋ৐๏ʹΑΓ࠷໬ਪఆ r at σ2 r at σ2 ࢀߟɿҪख߶, ೖ໳ ػցֶशʹΑΔҟৗݕ஌, 7.3અ ίϩφࣾ, 2015..

Slide 29

Slide 29 text

29 ະ஌ύϥϝʔλʢ܎਺ ʣΛ؍ଌσʔλ ͢΂ͯʹରͯ͠ਪఆ͢Δ at ࣗݾճؼϞσϧɿαϯϓϧ಺༧ଌͱαϯϓϧ֎༧ଌ αϯϓϧ಺༧ଌͷޡࠩ αϯϓϧ֎༧ଌͷޡࠩ ֶशظؒͱݕূظؒʹ෼ׂ͢ΔʢਤͰ͸1:1ʣ ̋ݦஶͳ 
 ֎Ε஋ ະ஌ύϥϝʔλΛֶशσʔλ͔Βਪఆ ͠ɺݕূσʔλͱͷ༧ଌޡࠩΛࢉग़͢Δ ̋ϗϫΠτ 
 ϊΠζ ✗Ϩϕϧγϑτʹ ա৒ద߹ͯ͠ޡ ͕ࠩখ͍͞ ✗ϊΠζͷ൓ԠʹΑΔ 
 ޡݕ஌ ϗςϦϯάͱಉ͡…? ̋Ϩϕϧγϑτ

Slide 30

Slide 30 text

30 ɾ఺୯Ґͷ֎Ε஋ΛΈΔΑΓɺMSEʢฏۉ༧ଌޡࠩʣͷΑ͏ͳྦྷੵͷޡࠩΛ ΈΔҟৗ౓Λ࠾༻͢Δ ɾෳ਺ͷख๏ͷ૊Έ߹Θͤ ͜Ε͔ΒͲ͏͢Δ͔ ࿥ը࣌఺Ͱະղܾ ୹࣌ؒͷҟৗͱɺͦΕҎ֎ͷҟৗͷ྆ํΛ͏·͘ͱΒ͑Δͷ͕೉͍͠ [PatternMatcher 2021] Fig. 2ΑΓసࡌ

Slide 31

Slide 31 text

ϑΣʔζ̍ ΦϑϥΠϯ ҟৗݕ஌ ϑΣʔζ2 ܗঢ় 
 ΫϥελϦϯά ҼՌάϥϑͷ 
 ੜ੒ લॲཧ ݪҼ਍அ ҼՌάϥϑͷ 
 ߨԋͰ͸εΩοϓ ϑΣʔζ̎ ܗঢ়ΫϥελϦϯά ߨԋͰ͸εΩοϓ

Slide 32

Slide 32 text

32 ɾػցֶशʹ͓͚ΔλεΫͷ໊শ͸ʮΫϥελϦϯάʯ ɾݸʑͷσʔλ͕Ͳͷఔ౓ྨࣅ͍ͯ͠Δ͔ ʢྨࣅ౓ɾڑ཭ई౓ʣ ɾΫϥελΛͲͷΑ͏ͳखॱͰൃݟ͍͔ͯ͘͠ ϑΣʔζ̎ɿܗঢ়͕ࣅ͍ͯΔ࣌ܥྻΛάϧʔϓԽ ωοτϫʔΫͷૹड৴ଳҬͱύέοτ਺ͷϝτϦΫε͕ΫϥελԽ͞Εͨྫ Ϋϥελ୅දΛબ୒ ॏ৺͔Βͷ࠷ۙ๣ͷ࣌ܥྻ ੵ෼஋͕࠷େͷ࣌ܥྻ ߨԋͰ͸εΩοϓ

Slide 33

Slide 33 text

33 ΫϥελϦϯάͷൣғ Proxy App DB Cluster 1 Cluster 2 ಉҰίϯϙʔωϯτ಺Ͱ 
 ΫϥελϦϯά ɾίϯϙʔωϯτΛ·͍ͨͩϝτϦΫεΛΫϥελϦϯά͢Δͱ 
 ҼՌάϥϑʹඞཁͳϊʔυ͕ࣦΘΕΔ ɾͨ·ͨ·঱ঢ়ϝτϦΫεͱݪҼϝτϦΫεͷܗঢ়͕ྨࣅ͢ΔέʔεͳͲ ߨԋͰ͸εΩοϓ

Slide 34

Slide 34 text

34 ࣌ܥྻΫϥελϦϯάͷੈք [Paparrizos 15]: Paparrizos J, Gravano L. k-Shape: Ef fi cient and accurate clustering of time series. SIGMOD 2015. [Paparrizos 15] Fig. 1. Time-series clustering taxonomy. ΑΓసࡌ [Paparrizos 15] Fig. 2. The time-series clustering approaches. ΑΓసࡌ ࣌ܥྻશମͷ 
 ΫϥελϦϯά ܗঢ়ʹجͮ͘ 
 Ξϓϩʔν ਓ͕ؒ࣌ܥྻάϥϑΛΈͯࣅ͍ͯΔͱ 
 ൑அ͢Δͷ͸ɺܗঢ়ʹج͍͍ͮͯΔ͸ͣ ߨԋͰ͸εΩοϓ

Slide 35

Slide 35 text

35 ࣌ܥྻΫϥελϦϯάͷܗঢ়ྨࣅੑ Paparrizos J, Gravano L. k-Shape: Ef fi cient and accurate clustering of time series. SIGMOD 2015. [k-Shape 2015] Figure 2: Similarity computation ΑΓ 
 Ұ෦ൈਮͯ͠సࡌ ED: ϢʔΫϦουڑ཭ [k-Shape 2015] Figure1: ΑΓҰ෦ൈਮͯ͠సࡌ ॎ࣠ʹରͯ͠৳ॖͨ͠ͱ͖ʹࣅ͍ͯΔ͔ʁ ԣ࣠ʹରͯ͠γϑτͨ͠ͱ͖ʹࣅ͍ͯΔ͔ʁ z-scoreม׵ (ฏۉ0,ඪ४ภࠩ1ͱ͢Δม׵)Ͱୡ੒ ԣ࣠γϑτʹର͢Δߟྀ͕ͳ͍ Scalingෆมੑ Shifttingෆมੑ DTW: ಈతۭؒ৳ॖ๏ʢओྲྀʣ 2ຊͷ࣌ܥྻͷ֤఺ಉ࢜ͷڑ཭Λશͯܭࢉ͢Δͨ Ίɺܭࢉྔ͕େ͖͍ ߨԋͰ͸εΩοϓ

Slide 36

Slide 36 text

36 SBD (Shape-Based Distance) Paparrizos J, Gravano L. k-Shape: Ef fi cient and accurate clustering of time series. SIGMOD 2015. ૬ޓ૬ؔ ϕΫτϧͷ಺ੵɿϕΫτϧಉ͕࢜֯౓͕͍ۙͱ͖ʢ ͕0ʹ͍ۙͱ͖ʣେ͖ͳ஋ΛͱΔ θ x ⋅ y = |x||y|cos θ 1ຊͷ࣌ܥྻΛ1ݸͷϕΫτϧͱΈͳ͢ ϕΫτϧΛͣΒ͠ͳ͕Β಺ੵΛܭࢉ͍͖ͯ͠ɺ಺ੵ͕࠷େͱͳΔγϑτ Λൃݟ͢Δ w CCw (x, y) = Rw−m (x, y) Rk (x, y) = { ∑m−k l=1 xl+k ⋅ yl R−k (y, x) x = (x1 , . . . , xm ) y = (y1 , . . . , ym ) ճ ΛӈʹฏߦҠಈͤͨ͞ 
 ͱ͖ͷ ͱͷྨࣅ౓Λ ͱ͢Δ k = w − m y x CCw (x, y) Λ 
 ʹରͯ͠ܭࢉ CCw (x, y) w ∈ 1,2,...,2m − 1 ͷܭࢉྔΛ 
 ߴ଎ϑʔϦΤม׵Ͱ 
 ΁ O(m2) O(mlogm) ߨԋͰ͸εΩοϓ

Slide 37

Slide 37 text

37 ΫϥελԽͷखॱͷݕ౼ ΫϥελϦϯά ֊૚తΫϥελϦϯά ෼ׂ࠷దԽతΫϥελϦϯά ࢀߟɿਆቇ හ߂, ΫϥελϦϯάʢClusteringʣ https://www.kamishima.net/archive/clustering.pdf Ϋϥελͷྑ͞Λදؔ͢਺Λఆٛ͠ɼ 
 ͦͷؔ਺Λ࠷దԽ͢ΔΑ͏ͳ 
 ΫϥελΛൃݟ ڽूܕ ෼ׂܕ σʔλҰ͕ͭݸʑͷΫϥελͷ 
 ঢ়ଶ͔Βɼॱ࣍ΫϥελΛซ߹ σʔλू߹શମ͕ҰͭͷΫϥελ ͷঢ়ଶ͔Βɼॱ࣍ΫϥελΛ෼ׂ ܭࢉྔ͕ଟ͍ ܭࢉྔ͕ଟ͍ ࣄલʹΫϥελ਺Λ 
 ܾΊͳ͚Ε͹ͳΒͳ͍ ༗໊ͳk-means͸ 
 ͪ͜Β ܭࢉྔ͕গͳ͍ ߨԋͰ͸εΩοϓ

Slide 38

Slide 38 text

38 ֊૚తΫϥελϦϯάͷߴ଎ੑ ࠷దͳΫϥελ਺Λܾఆ͢ΔͨΊʹ ܁Γฦ࣮͠ߦ ֊૚తΫϥελϦϯάʢ࠷୹ڑ཭๏ʣ ڑ཭ͷᮢ஋Λࣄલઃఆ ͯ͠Ϋϥελ਺Λܾఆ ෼ׂ࠷దԽΫϥελϦϯά ߨԋͰ͸εΩοϓ

Slide 39

Slide 39 text

39 ɾܗঢ়͕ࣅ͍ͯΔ࣌ܥྻΛάϧʔϓԽ ɾڑ཭ई౓ͱͯ͠ɺ૬ޓ૬ؔϕʔεͷSBDΛ࠾༻ ɾߴ଎ੑͷͨΊɺ֊૚తΫϥελϦϯά+࠷୹ڑ཭๏ͷ࠾༻ ϑΣʔζ̎ɿΫϥελϦϯά·ͱΊ [FluxRank 2019]͸ɺີ౓४ڌΫϥελϦϯά + ϐΞιϯ૬ؔʹΑΔ࣌ܥྻ ΫϥελϦϯά͕࠾༻͞Ε͍ͯͨ͜ͱʹؾ͍ͮͨͷͰɺൺֱ͢Δඞཁ͕͋Δ ɾݱ࣌఺Ͱ͸՝୊ͳ͠ͷ͸͕ͣ… ߨԋͰ͸εΩοϓ

Slide 40

Slide 40 text

ϑΣʔζ̍ ΦϑϥΠϯ ҟৗݕ஌ ϑΣʔζ2 ܗঢ় 
 ΫϥελϦϯά લॲཧ ݪҼ਍அ ҼՌάϥϑͷ 
 ੜ੒ͯ֬͠ೝ͍ͨ͠ ݪҼ਍அɿҼՌάϥϑੜ੒ ҼՌάϥϑͷ 
 ੜ੒

Slide 41

Slide 41 text

41 ɾMݸͷ֬཰ม਺ ɾ֬཰ม਺͝ͱʹNݸͷඪຊ ҼՌ୳ࡧΛϝτϦΫεͷݪҼ਍அʹԠ༻ DAGʢ༗޲ඇ८ճάϥϑʣ ग़ྗ front-end:latency user:latency ɾ 
 ɾ 
 ɾ Mݸͷ 
 ࣌ܥྻ Nݸͷඪຊ orders:latency user:cpu_usage user-db:cpu_usage orders:network_transmit_bytes ঱ঢ়ϝτϦΫε orders-db:network_receive_bytes

Slide 42

Slide 42 text

42 ౷ܭతҼՌ୳ࡧͷੈք ؍ଌσʔλ͔ΒҼՌؔ܎Λਪఆ͢Δ ੍໿ϕʔε είΞϕʔε Glymour C, et al., Review of causal discovery methods based on 
 graphical models. Frontiers in genetics. 2019 ߏ଄ํఔࣜϕʔε ϊϯύϥϝτϦοΫʢԾఆΛ͓͔ͳ͍ʣ ύϥϝτϦοΫɾ 
 ηϛύϥϝτϦοΫ PCΞϧΰϦζϜ (1991) ม਺ؒͷ৚݅෇͖ಠཱੑΛ੍໿ 
 ͱͯ͠ҼՌάϥϑΛߏங͢Δ FCIΞϧΰϦζϜ (1995) ະ؍ଌڞ௨ݪҼΛߟྀ ಉ͡৚݅෇͖ಠཱੑΛ༩ ͑ΔҼՌάϥϑͷू߹Ͱ ͋ΔϚϧίϑಉ஋ྨ͝ͱ ʹɼϞσϧͷΑ͞ΛධՁ GESΞϧΰϦζϜ (2003) LiNGAM (2006) ݪҼΛXΛ݁ՌΛYͱͯ͠ ߏ଄ΛํఔࣜͰදݱ ؔ਺ܥ΍ޡࠩม਺ͷ෼෍ ʹԾఆΛஔ͘

Slide 43

Slide 43 text

43 1. ॳظԽɿ Mݸͷ֬཰ม਺ͷ׬શ࿈݁άϥϑΛߏங͢Δ PCΞϧΰϦζϜ 2. Τοδ࡟আɿ ྡ઀͢Δ֤ม਺ʹ͍ͭͯɺ৚݅෇͖ಠཱੑ͕ଘࡏ͢Ε͹ɺ2 ม਺ؒͷΤοδΛ࡟আɻ 3. ϧʔϧϕʔεͷํ޲ܾఆʢv-structureɺΦϦΤϯςʔγϣϯϧʔϧʣ ಠཱੑɿม਺ؒͷ૬ؔΛΈΔͳͲͷख๏Ͱಠཱ͔Ͳ͏͔Λਪఆɻ ৚݅෇͖ಠཱੑɿ͋Δม਺Λ৚݅ͱͨ͠ͱ͖ͷଞͷ2ม਺ͷಠཱੑ

Slide 44

Slide 44 text

44 1. ॳظԽ PCΞϧΰϦζϜͷࣄલ஌ࣝʹΑΔಠ֦ࣗு 2. Τοδ࡟আ 3. ϧʔϧϕʔεͷํ޲ܾఆ ํ޲ܾఆ͸ؒҧ͍͑ͯΔ͜ͱ͕͋Δ ᶃ ௚઀ωοτϫʔΫ௨৴Λ͍ͯ͠ͳ͍ม਺ؒͷΤοδΛແ৚݅Ͱ࡟আ ᶄ ௨৴ͷऩूσʔλΛجʹɺํ޲Λܾఆɾमਖ਼͢Δ 
 ɹL4௨৴ɿTCP/UDPͷ઀ଓ։࢝ํ޲ɺύέοτͷϥά૬ؔ ɹL7௨৴ɿHTTPͷϦΫΤετͷ޲͖ … ׬શ࿈݁άϥϑͰ͸Τοδ͕ଟ͍ͨΊɺΤοδ࡟আͷॲཧ͕࣌ؒ௕͍ ࡟আͨ݁͠Ռ ߨԋͰ͸εΩοϓ

Slide 45

Slide 45 text

ϝοηʔδ ੜ੒͞ΕͨҼՌάϥϑͷྫ

Slide 46

Slide 46 text

46 ݪҼͱ঱ঢ়ͷϝτϦΫεؒͷ 
 ܦ࿏͕੾ΒΕΔ PCΞϧΰϦζϜͷ৚݅෇͖ಠཱੑݕఆͷ՝୊ front-end:latency user:latency orders:latency user:memory_usage user-db:cpu_usage orders:network_receive_bytes ঱ঢ়ϝτϦΫε orders-db:network_transmit_bytes ݪҼ 
 ϝτϦΫε ௐࠪ ৚݅෇͖ಠཱੑݕఆͰ͸ɺ৚݅ม਺ͷӨڹΛআڈ͠ ্ͨͰɺ2ม਺ͷ૬ؔΛΈΔʢภ૬ؔʣ ো֐ൃੜ࣌ͷࣅͨมಈͷܥྻ͕ଟ͍ͱ͖ʹޡͬͯ Τοδ͕੾ΒΕ΍͍͢Α͏ʹΈ͑Δ front-end: 
 latency orders-db: 
 network_transmit 
 _bytes user: 
 memory_ 
 usage user: 
 latency ࣅͨมಈΛࣔ͢3ม਺ؒͷ Τοδ͕ͳ͔ͥ੾ΒΕ͍ͯΔ ✗

Slide 47

Slide 47 text

47 2ม਺ͷಠཱੑݕఆͷΈͰΤοδ࡟আ ɾ૬ؔͷڧ͍ม਺͕ଟ͍ͨΊɺҼՌάϥϑ͕ڊେʹͳΔ ɾʮมԽͷ։࢝఺ʯʹண໨͠ɺ։࢝఺ͷҐஔͰΫϥελϦϯά 
 Ϋϥελ͝ͱʹҼՌάϥϑߏங PCΞϧΰϦζϜͷ՝୊ΛͲͷΑ͏ʹղܾ͢Δ͔ʁ ࿥ը࣌఺Ͱະղܾ ৚݅ม਺ͷબ୒Λߟྀ͢Δ ɾมԽ܏޲͕͍ۙม਺Λબ୒ީิ͔Β֎͢

Slide 48

Slide 48 text

48 ɾ֤࿦Λ௨ఈ͢ΔΑ͏ͳʮ஌ʯ͸·ͩݟ͍͍ͩͤͯͳ͍ ɾݪҼ਍அʹ޲͍ͨલॲཧϑϨʔϜϫʔΫͷఏҊΛ໨ࢦ͢ 3. ·ͱΊɿ֤࿦ͷ੔ཧͱֶज़తߩݙ ϑΣʔζ̍ ΦϑϥΠϯ ҟৗݕ஌ ϑΣʔζ2 ܗঢ় 
 ΫϥελϦϯά ҼՌάϥϑͷ 
 ੜ੒ લॲཧ ݪҼ਍அ ҼՌάϥϑͷ 
 PCΞϧΰϦζϜʹ͸੍໿͕͋Δ

Slide 49

Slide 49 text

4. ·ͱΊ

Slide 50

Slide 50 text

50 1. AIOpsݚڀ͸ɺো֐؅ཧͱϦιʔεϓϩϏδϣχϯάͷ2ͭ 2. “Alert symptoms, diagnose causes”ͷண૝ͱઌߦݚڀ 3. ݪҼ਍அʹ޲͚ͨલॲཧϑϨʔϜϫʔΫͷߏ੒Λ໨ࢦͨ͠ࢼߦࡨޡ 4. ҼՌάϥϑͷੜ੒ͱ఻౷తͳख๏ͷ՝୊ ·ͱΊ

Slide 51

Slide 51 text

51 ɾσʔλαΠΤϯεͷಛఆྖҬͷ஌͚ࣝͩͰ͸׬݁͠ͳ͍ͨΊɺֶͿ͜ͱ ͕େม ɾ AIOps෼໺ͷॻ੶͕ӳޠݍ΋ؚΊͯଘࡏ͠ͳ͍ͨΊɺ෼໺ͷ೴಺Ϛοϓ Λߏங͢Δ͜ͱ͕೉͍͠ ɾዞҙతͳύϥϝʔλઃఆͳ͠ͰϞσϧͷੑೳΛߴΊΔ͜ͱ͕೉͍͠ ɾՄࢹԽ͠ͳ͍ͱਖ਼͍݁͠ՌΛฦ͍ͯ͠Δ͔͕Θ͔Βͳ͍ ɾ࣮ݧʹ࢖͑ΔΑ͏ͳ࣮؀ڥσʔλ͕ೖखͮ͠Β͍ ɾࠃ಺Ͱ͸ɺAIOpsʹऔΓ૊ΜͰ͍Δਓ͕গͳ͍ͨΊɺίϛϡχςΟͰ৘ ใΛڞ༗͠ͳ͕ΒΈΜͳͰࢁΛొΔ͜ͱ͕೉͍͠ ΍ͬͯΈͯ೉͔ͬͨ͜͠ͱ

Slide 52

Slide 52 text

52 σʔληοτͷಈతੜ੒γεςϜɿMeltria ࿩͞ͳ͔ͬͨ͜ͱ ௶಺༎थ, ੨ࢁਅ໵, MeltriaɿϚΠΫϩαʔϏεʹ͓͚Δҟৗݕ஌ɾݪҼ෼ੳͷͨΊͷσʔληοτͷಈతੜ੒γεςϜ, Πϯλʔωοτͱӡ༻ٕज़ γϯϙδ΢Ϝ࿦จू, 2021, 63-70 (2021-11-18), 2021೥11݄. https://github.com/ai4sre/meltria Kubernets Sock Shop: ϚΠΫϩαʔϏεΞϓϦ LitmusChaos: ނো஫ೖ Argo Workflows: εέδϡʔϥ

Slide 53

Slide 53 text

53 ɾΠϯγσϯτͱϙετϞʔςϜͷڞ༗ ɾݪҼϝτϦΫε΍ϩά͕ͳΜͰ͔͋ͬͨ ɾো֐ͷ࢑ఆճ෮ʹࢸΔ·ͰʹσʔλΛͲͷखॱͰΈ͔ͨ কདྷͷల๬ ো֐؅ཧʹؔ͢ΔAIOpsͷίϛϡχςΟΛৢ੒͍͖͍ͯͨ͠ ɾϝτϦΫεͳͲͷӡ༻σʔλͷڞ༗ ɾશσʔλͷڞ༗͸೉͘͠ͱ΋ɺಛఆͷϝτϦΫεͷσʔλ͚ͩͰ΋ ɾো֐஫ೖ& ෛՙࢼݧ

Slide 54

Slide 54 text

ෳࡶͳιϑτ΢ΣΞͷӡ༻ʹదԠ͢ΔͨΊʹɺɹɹɹɹɹ "*ʹΑΓਓؒͷೝ஌ॲཧ͕ࣗಈԽ͞ΕΔɻ 
 ͦͷҰํͰɺ"*ͱ͍͏ผछͷෳࡶ͞Λ΋ͭιϑτ΢ΣΞΛ ৽ͨʹӡ༻͢Δ͜ͱʹͳΔɻ 
 
 ͜ͷ૖େͳ*SPOJFTPG"VUPNBUJPOʢࣗಈԽͷൽ೑ʣΛ 
 ਓྨ͸ͲͷΑ͏ʹղܾ͍ͯ͘͠ͷ͔͸Θ͔Βͳ͍ɻ