Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI時代に向けたクラウドにおける信頼性エンジニアリングの未来構想 / DICOMO2022 6A-1

AI時代に向けたクラウドにおける信頼性エンジニアリングの未来構想 / DICOMO2022 6A-1

DICOMO2022 6A 統一セッション:クラウド 招待講演

https://tsys.jp/dicomo/2022/program/program_abst.html#6A-1

情報サービスの利用者に必要な機能を頻繁に加え続けながらも、いかに必要十分な信頼性を継続させるかが従前より課題となっている。この課題に対するひとつの回答とも言える、Googleが提唱した情報サービスの新しい運用形態であるSite Reliability Engineering(SRE)の普及が進んでいます。本発表では、SREの中核概念を整理した上で、AI時代に向けて、AIとの対話を軸にした未来の運用のあり方を構想します。

Yuuki Tsubouchi (yuuk1)

July 14, 2022
Tweet

More Decks by Yuuki Tsubouchi (yuuk1)

Other Decks in Research

Transcript

  1. AI࣌୅ʹ޲͚ͨΫϥ΢υʹ͓͚Δ

    ৴པੑΤϯδχΞϦϯάͷະདྷߏ૝
    ௶಺༎थɹɹ௽ాതจ
    2022/07/14
    DICOMO 2022 ট଴ߨԋ
    ※1 ͘͞ΒΠϯλʔωοτݚڀॴ


    ※2 ژ౎େֶେֶӃ৘ใֶݚڀՊ
    ※̍ ※1
    ※2

    View Slide

  2. 2
    ϓϩϑΟʔϧ
    ௶಺ ༎थ
    ͘͞ΒΠϯλʔωοτݚڀॴɹݚڀһ


    ژ౎େֶେֶӃ৘ใֶݚڀՊɹത࢜ޙظ՝ఔ3೥


    TopotalɹςΫϊϩδΞυόΠβʔ
    ৽ଔ͔Β5೥ؒɺגࣜձࣾ͸ͯͳͰΤϯδχΞΛ຿ΊΔ
    https://yuuk.io/
    2019೥ΑΓ͘͞ΒΠϯλʔωοτʹస৬͠ɺݚڀ։ൃͷੈք΁
    2020೥ʹژ౎େֶେֶӃ ത࢜ޙظ՝ఔʹೖֶ
    @yuuk1t
    Ϋϥ΢υʹ͓͚Δߴ৴པԽͷͨΊͷɺ

    ӡ༻σʔλͷߴޮ཰ͳऩूͱɺ

    ౷ܭղੳɾػցֶशʹجͮ͘ো֐ݪҼ਍அ
    ݚڀςʔϚ

    View Slide

  3. ͦ͜Ͱɺ৘ใγεςϜͷ৴པੑʹؔ͢ΔΤϯδχΞϦϯάͷݱࡏ
    Λ੔ཧ͠ɺདྷͨΔ΂͖ະདྷͷAI࣌୅ʹ͓͚Δ৴པੑ΁ͷΞϓϩʔ
    νΛߏ૝͠·͢
    Έͳ͞·ͷࠓޙͷݚڀͷண૝ͷछͱͯ࣋ͪ͠ؼ͍͚ͬͯͨͩΔ͜
    ͱ͕͋Ε͹޾͍Ͱ͢ɻ·ͨɺຊߏ૝ΛίϛϡχςΟͰҭ͍͖ͯͯ
    ͍ͨͱ΋ߟ͍͑ͯ·͢ɻ
    اۀʹ͓͚ΔࣄۀʹؔΘΔதͰɺະདྷͷల๬Λݚڀऀͷཱ৔Ͱఏࣔ
    ͢Δ͜ͱͷॏཁੑ͕ߴ·͍ͬͯΔΑ͏ʹײ͍ͯ͡·͢ɻ

    View Slide

  4. 1. Ϋϥ΢υʹ͓͚Δ৴པੑΤϯδχΞϦϯά


    2. AI࣌୅ʹ͓͚Δ৴པੑΤϯδχΞϦϯάͷະདྷ


    3. AIͱͷڠಇʹΑΔ৴པੑΤϯδχΞϦϯάͷݕ౼


    4. ͓ΘΓʹ
    4
    ΞδΣϯμ
    ݱࡏɺͲ͏ͳͬͯ

    ͍Δͷ͔
    20೥ઌͷະདྷͰ

    Ͳ͏͋Γ͍͔ͨ
    ະདྷͱݱࡏͷࠩΛ

    ຒΊΔಓے͸ͳʹ͔

    View Slide

  5. 1. Ϋϥ΢υʹ͓͚Δ৴པੑΤϯδχΞϦϯά


    2. AI࣌୅ʹ͓͚Δ৴པੑΤϯδχΞϦϯάͷະདྷ


    3. AIͱͷڠಇʹΑΔ৴པੑΤϯδχΞϦϯάͷݕ౼


    4. ͓ΘΓʹ
    5
    ΞδΣϯμ
    ݱࡏɺͲ͏ͳͬͯ

    ͍Δͷ͔
    20೥ઌͷະདྷͰ

    Ͳ͏͋Γ͍͔ͨ
    ະདྷͱݱࡏͷࠩΛ

    ຒΊΔಓے͸ͳʹ͔

    View Slide

  6. 6
    ৘ใγεςϜͷ৴པੑʢReliabilityʣͷॏཁੑ
    ɾߨԋ௚લʹ͸ɺCloud
    fl
    areʢ݄̒ʣͱKDDIʢ݄̓ʣͷͦΕͧΕͷγεςϜʹ
    େن໛ͳো֐͕ൃੜͨ͠


    ɾো֐ʹૺ۰͢Δͱɺਓʑ͸ࣗಈԽ͞ΕͨγεςϜΛ৴པͯ͠Α͍΋ͷ͔෼͔
    Βͣɺґଘ͢Δ͜ͱΛڪΕΔ


    ɾҰํͰɺ৘ใγεςϜͷ৴པੑΛҡ࣋͢ΔͨΊʹɺ৘ใٕज़ऀ͕೔ʑ࿑ۤΛ
    ॏͶ͍ͯΔ


    ɾࠓޙɺDX͕Ճ଎͢ΔதͰɺ৴པੑʹؔΘΔ໰୊ʹऔΓ૊Ή͜ͱ͸ॏཁͰ͋Δ
    [Beyer+, 2016] Site Reliability Engineering: How Google Runs Production Systems
    ৘ใγεςϜʹ͓͍ͯɺʮ৴པੑ͸࠷΋جຊతͳػೳʯͰ͋Δ
    [Beyer+, 2016]

    View Slide

  7. 7
    ৘ใγεςϜʹ͓͚Δʮ৴པੑʯͷݱࡏ
    ৴པੑͷʮݱࡏʯʹͭͳ͕Δɺྺ࢙తมભΛΈ͍ͯ͘
    ߴස౓ͷมߋͱߴ৴པੑΛཱ྆͢ΔͨΊͷΞϓϩʔνͷීٴ͕ਐΜͰ͍Δ
    ݱࡏͷ৘ใγεςϜ͸ɺΫϥ΢υίϯϐϡʔςΟϯάʹΑΔఏڙ͕Ұൠత
    Ϋϥ΢υ
    Site Reliability EngineeringʢSREʣ
    Ϧιʔεڞ༗ɺ޿ҬωοτϫʔΫɺҟ
    छιϑτ΢ΣΞ/ϋʔυ΢ΣΞɺͦΕ
    Βͷෳࡶͳ૬ޓ࡞༻Λ੒͢γεςϜ
    Πϯλʔωοτ
    Infrastructure
    Platform
    Application
    ઌ୺اۀͰ͸ɺ1೔ෳ਺ճҎ্ͷมߋ
    [Humble+, 2018] Accelerate: The Science of Lean Software and DevOps: Building and scaling high performing technology organizations
    [Beyer+, 2016] Site Reliability Engineering: How Google Runs Production Systems
    [Humble+, 2018]
    [Beyer+, 2016]

    View Slide

  8. 8
    ৴པੑʹؔΘΔ΋ͷ͝ͱͷྺ࢙తมભ
    ೥୅ ৴པੑͷର৅ γεςϜͷఏڙܗଶ ߴ৴པԽͷߟ͑ํ ৴པੑͷఆٛ
    1940~
    60

    ϋʔυ΢ΣΞ


    ػثΛ෺ཧతʹग़ՙ ނোͤͣʹ௕࣋ͪͤ͞
    Δ
    ʢ଱ٱੑʣΞΠςϜ͕༩͑Β
    Εͨ৚݅ͷԼͰɺ༩͑ΒΕͨ
    ظؒɺނোͤͣʹɺཁٻͲ͓
    Γʹ਱ߦͰ͖Δೳྗ
    1960~
    80
    ιϑτ΢Σ
    Ξ
    ιʔείʔυɾ࣮ߦϑΝΠ
    ϧɺ·ͨ͸ɺΠϯετʔϧ͞
    Εͨίϯϐϡʔλ͝ͱೲ඼
    ίϯϙʔωϯτͱͦͷ
    ૊Έ߹Θͤͷग़ྗͷͦ
    ΕͧΕΛࣄલʹ֬ೝ
    ʢอશੑʣʢઃܭ৴པ
    ੑʣ
    1980~
    2000
    Πϯλʔ
    ωοτ
    ୯ҰͷڊେωοτϫʔΫΛڞ
    ༗ͯ͠ར༻
    ߴ଎௨৴ɺ஗Ԇɾ఻ૹ
    ޡΓɺϊʔυނোΛલ
    ఏͱ͢Δ௨৴ϓϩτί
    ϧͷઃܭ
    ʢ૯߹৴པੑʣΞΠςϜ
    ͕ɼཁٻ͞Εͨͱ͖ʹɺͦ
    ͷཁٻͲ͓Γʹɺ਱ߦ͢Δ
    ͨΊͷೳྗ
    2000
    ~
    Ϋϥ΢υ ࣄۀऀʹΑΓγεςϜΛूத
    ؅ཧɾৗ࣌Քಇɻར༻ऀ͸Π
    ϯλʔωοτܦ༝Ͱར༻
    ৴པੑͷ௿͍ίϯϙʔ
    ωϯτ܈Λ౔୆ʹ৴པ
    ੑͷߴ͍γεςϜઃܭ
    ௥Ճͷݫີͳఆٛ͸֬ೝͰ
    ͖ͣɻΑΓར༻ऀ໨ઢͷ৴
    པੑΛݸผ۩ମతʹఆٛɻ
    ※1 JIS Z 8115:2019
    ※1
    ※1
    [Saleh+, 2006] Highlights from the early (and pre-) history of reliability engineering

    [Kleppmann,2017] Designing Data-intensive Applications: The big ideas behind reliable, scalable, and maintainable systems
    ※1
    [ࢁຊ+ 2021] ֬཰ɾ౷ܭ͔Β࢝ΊΔ ΤϯδχΞͷͨΊͷ৴པੑ޻ֶ- ਎ۙͳނো͔ΒӉ஦։ൃ·Ͱ -,ίϩφࣾ

    View Slide

  9. 9
    ৴པੑʹؔΘΔ΋ͷ͝ͱͷྺ࢙తมભ
    ೥୅ ৴པੑͷର৅ γεςϜͷఏڙܗଶ ߴ৴པԽͷߟ͑ํ ৴པੑͷఆٛ
    1940~
    60

    ϋʔυ΢ΣΞ


    ػثΛ෺ཧతʹग़ՙ ނোͤͣʹ௕࣋ͪͤ͞
    Δ
    ʢ଱ٱੑʣΞΠςϜ͕༩͑Β
    Εͨ৚݅ͷԼͰɺ༩͑ΒΕͨ
    ظؒɺނোͤͣʹɺཁٻͲ͓
    Γʹ਱ߦͰ͖Δೳྗ
    1960~
    80
    ιϑτ΢Σ
    Ξ
    ιʔείʔυɾ࣮ߦϑΝΠ
    ϧɺ·ͨ͸ɺΠϯετʔϧ͞
    Εͨίϯϐϡʔλ͝ͱೲ඼
    ίϯϙʔωϯτͱͦͷ
    ૊Έ߹Θͤͷग़ྗͷͦ
    ΕͧΕΛࣄલʹ֬ೝ
    ʢอશੑʣʢઃܭ৴པ
    ੑʣ
    1980~
    2000
    Πϯλʔ
    ωοτ
    ୯ҰͷڊେωοτϫʔΫΛڞ
    ༗ͯ͠ར༻
    ߴ଎௨৴ɺ஗Ԇɾ఻ૹ
    ޡΓɺϊʔυނোΛલ
    ఏͱ͢Δ௨৴ϓϩτί
    ϧͷઃܭ
    ʢ૯߹৴པੑʣΞΠςϜ
    ͕ɼཁٻ͞Εͨͱ͖ʹɺͦ
    ͷཁٻͲ͓Γʹɺ਱ߦ͢Δ
    ͨΊͷೳྗ
    2000
    ~
    Ϋϥ΢υ ࣄۀऀʹΑΓγεςϜΛूத
    ؅ཧɾৗ࣌Քಇɻར༻ऀ͸Π
    ϯλʔωοτܦ༝Ͱར༻
    ৴པੑͷ௿͍ίϯϙʔ
    ωϯτ܈Λ౔୆ʹ৴པ
    ੑͷߴ͍γεςϜઃܭ
    ௥Ճͷݫີͳఆٛ͸֬ೝͰ
    ͖ͣɻΑΓར༻ऀ໨ઢͷ৴
    པੑΛݸผ۩ମతʹఆٛɻ
    ※1 JIS Z 8115:2019
    ※1
    ※1
    [Saleh+, 2006] Highlights from the early (and pre-) history of reliability engineering

    [Kleppmann,2017] Designing Data-intensive Applications: The big ideas behind reliable, scalable, and maintainable systems
    ※1
    [ࢁຊ+ 2021] ֬཰ɾ౷ܭ͔Β࢝ΊΔ ΤϯδχΞͷͨΊͷ৴པੑ޻ֶ- ਎ۙͳނো͔ΒӉ஦։ൃ·Ͱ -,ίϩφࣾ
    ෳ੡඼͔Βৗ࣌ՔಇͷҰ఺΋ͷ΁

    View Slide

  10. 10
    ৴པੑʹؔΘΔ΋ͷ͝ͱͷྺ࢙తมભ
    ೥୅ ৴པੑͷର৅ γεςϜͷఏڙܗଶ ߴ৴པԽͷߟ͑ํ ৴པੑͷఆٛ
    1940~
    60

    ϋʔυ΢ΣΞ


    ػثΛ෺ཧతʹग़ՙ ނোͤͣʹ௕࣋ͪͤ͞
    Δ
    ʢ଱ٱੑʣΞΠςϜ͕༩͑Β
    Εͨ৚݅ͷԼͰɺ༩͑ΒΕͨ
    ظؒɺނোͤͣʹɺཁٻͲ͓
    Γʹ਱ߦͰ͖Δೳྗ
    1960~
    80
    ιϑτ΢Σ
    Ξ
    ιʔείʔυɾ࣮ߦϑΝΠ
    ϧɺ·ͨ͸ɺΠϯετʔϧ͞
    Εͨίϯϐϡʔλ͝ͱೲ඼
    ίϯϙʔωϯτͱͦͷ
    ૊Έ߹Θͤͷग़ྗͷͦ
    ΕͧΕΛࣄલʹ֬ೝ
    ʢอશੑʣʢઃܭ৴པ
    ੑʣ
    1980~
    2000
    Πϯλʔ
    ωοτ
    ୯ҰͷڊେωοτϫʔΫΛڞ
    ༗ͯ͠ར༻
    ߴ଎௨৴ɺ஗Ԇɾ఻ૹ
    ޡΓɺϊʔυނোΛલ
    ఏͱ͢Δ௨৴ϓϩτί
    ϧͷઃܭ
    ʢ૯߹৴པੑʣΞΠςϜ
    ͕ɼཁٻ͞Εͨͱ͖ʹɺͦ
    ͷཁٻͲ͓Γʹɺ਱ߦ͢Δ
    ͨΊͷೳྗ
    2000
    ~
    Ϋϥ΢υ ࣄۀऀʹΑΓγεςϜΛूத
    ؅ཧɾৗ࣌Քಇɻར༻ऀ͸Π
    ϯλʔωοτܦ༝Ͱར༻
    ৴པੑͷ௿͍ίϯϙʔ
    ωϯτ܈Λ౔୆ʹ৴པ
    ੑͷߴ͍γεςϜઃܭ
    ௥Ճͷݫີͳఆٛ͸֬ೝͰ
    ͖ͣɻΑΓར༻ऀ໨ઢͷ৴
    པੑΛݸผ۩ମతʹఆٛɻ
    ※1 JIS Z 8115:2019
    ※1
    ※1
    [Saleh+, 2006] Highlights from the early (and pre-) history of reliability engineering

    [Kleppmann,2017] Designing Data-intensive Applications: The big ideas behind reliable, scalable, and maintainable systems
    ※1
    [ࢁຊ+ 2021] ֬཰ɾ౷ܭ͔Β࢝ΊΔ ΤϯδχΞͷͨΊͷ৴པੑ޻ֶ- ਎ۙͳނো͔ΒӉ஦։ൃ·Ͱ -,ίϩφࣾ
    ෦඼͕ؒ૬ޓ࡞༻͢ΔΑ͏ͳγεςϜ΁


    ෦඼ͷނোΛલఏͱͨ͠ઃܭͱอक΁

    View Slide

  11. 11
    ৴པੑʹؔΘΔ΋ͷ͝ͱͷྺ࢙తมભ
    ೥୅ ৴པੑͷର৅ γεςϜͷఏڙܗଶ ߴ৴པԽͷߟ͑ํ ৴པੑͷఆٛ
    1940~
    60

    ϋʔυ΢ΣΞ


    ػثΛ෺ཧతʹग़ՙ ނোͤͣʹ௕࣋ͪͤ͞
    Δ
    ʢ଱ٱੑʣΞΠςϜ͕༩͑Β
    Εͨ৚݅ͷԼͰɺ༩͑ΒΕͨ
    ظؒɺނোͤͣʹɺཁٻͲ͓
    Γʹ਱ߦͰ͖Δೳྗ
    1960~
    80
    ιϑτ΢Σ
    Ξ
    ιʔείʔυɾ࣮ߦϑΝΠ
    ϧɺ·ͨ͸ɺΠϯετʔϧ͞
    Εͨίϯϐϡʔλ͝ͱೲ඼
    ίϯϙʔωϯτͱͦͷ
    ૊Έ߹Θͤͷग़ྗͷͦ
    ΕͧΕΛࣄલʹ֬ೝ
    ʢอશੑʣʢઃܭ৴པ
    ੑʣ
    1980~
    2000
    Πϯλʔ
    ωοτ
    ୯ҰͷڊେωοτϫʔΫΛڞ
    ༗ͯ͠ར༻
    ߴ଎௨৴ɺ஗Ԇɾ఻ૹ
    ޡΓɺϊʔυނোΛલ
    ఏͱ͢Δ௨৴ϓϩτί
    ϧͷઃܭ
    ʢ૯߹৴པੑʣΞΠςϜ
    ͕ɼཁٻ͞Εͨͱ͖ʹɺͦ
    ͷཁٻͲ͓Γʹɺ਱ߦ͢Δ
    ͨΊͷೳྗ
    2000
    ~
    Ϋϥ΢υ ࣄۀऀʹΑΓγεςϜΛूத
    ؅ཧɾৗ࣌Քಇɻར༻ऀ͸Π
    ϯλʔωοτܦ༝Ͱར༻
    ৴པੑͷ௿͍ίϯϙʔ
    ωϯτ܈Λ౔୆ʹ৴པ
    ੑͷߴ͍γεςϜઃܭ
    ௥Ճͷݫີͳఆٛ͸֬ೝͰ
    ͖ͣɻΑΓར༻ऀ໨ઢͷ৴
    པੑΛݸผ۩ମతʹఆٛɻ
    ※1 JIS Z 8115:2019
    ※1
    ※1
    [Saleh+, 2006] Highlights from the early (and pre-) history of reliability engineering

    [Kleppmann,2017] Designing Data-intensive Applications: The big ideas behind reliable, scalable, and maintainable systems
    ※1
    [ࢁຊ+ 2021] ֬཰ɾ౷ܭ͔Β࢝ΊΔ ΤϯδχΞͷͨΊͷ৴པੑ޻ֶ- ਎ۙͳނো͔ΒӉ஦։ൃ·Ͱ -,ίϩφࣾ
    ෦඼ͷނোΛલఏͱͨ͠ΑΓ޿ൣғͷఆٛ΁

    View Slide

  12. 12
    ։ൃ͞Εͨ੒Ռ෺Λೲ඼ͨ͠ͷͪʹɺӡ༻ɾอक͢Δ
    2ஈ֊ͷϥΠϑαΠΫϧ
    ιϑτ΢ΣΞγεςϜͷϥΠϑαΠΫϧͷมભ
    Ϋϥ΢υ্Ͱৗ࣌Քಇ͢ΔγεςϜͷ։ൃͱӡ༻Λಉ࣌ʹ࣮ફ͢Δ
    ஈ֊෼͚ͳ͠ͷϥΠϑαΠΫϧ
    աڈ
    ݱ୅
    ࢀߟɿDevOps
    [Allspaw+, 2009] 10+ Deploys Per Day: Dev and Ops Cooperation at Flickr https://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-
    ops-cooperation-at-flickr
    ఏڙܗଶ͕ৗ࣌Քಇ͢ΔҰ఺΋ͷαʔϏε΁มԽ
    ҰํͰɺมߋස౓͕ߴ͍͜ͱ͔Βɺมߋ͕ো֐ͷҾ͖ۚͱͳΔ
    [Beyer+, 2018] The Site Reliability Workbook: Practical Ways to Implement SRE
    GoogleͰ͸ো֐ͷҾ͖ۚͷ͏ͪ68%͸มߋʹΑΔ΋ͷ
    [Allspaw+, 2009]
    [Beyer+, 2018]

    View Slide

  13. 13
    มߋʹΑΔো֐ൃੜΛલఏͱ͢Δߟ͑ํ
    ɾ׬શͳ৴པੑ͸໨ࢦ͞ͳ͍


    ɾ৴པੑͷࢦඪͱͦͷ໨ඪ஋Λઃఆ


    ɾ໨ඪ஋ΛԼݶͱͯ͠ɺ։ൃऀ͸ੵۃతʹมߋՄೳͱ͢Δ


    ɾʮ৴པੑʯΛ੍ޚ͠ɺมߋ଎౓Λ࠷େԽ͢Δ
    Site Reliability Engineering (SRE)
    ཱ྆͢Δʹ͸
    ߴස౓ͷมߋ ߴ৴པੑ
    [Beyer+, 2016] Site Reliability Engineering: How Google Runs Production Systems
    [Beyer+, 2016]

    View Slide

  14. 14
    ɾιϑτ΢ΣΞʹΑΔӡ༻ࣗಈԽ͸͢Ͱʹ࣮ફ͞Ε͍ͯͨ


    ɾมߋΛલఏͱͨ͠ো֐Λڐ༰͢ΔͨΊͷΞϓϩʔν͕ීٴ
    Googleʹ͓͚Δᴈ໌ظ
    ੈքతͳ෩Ӣظ
    ɾӡ༻Λιϑτ΢ΣΞΤϯδχΞϦϯάͰ࠶ఆٛ


    ɾιϑτ΢ΣΞʹΑΔΦϖϨʔγϣϯͷࣗಈԽ
    Site Reliability EngineeringʢSREʣͷීٴ
    2004೥
    2014೥

    Ҏ߱
    ίϛϡχςΟͰڞ༗͞ΕΔSREͷݫ֨ͳఆٛ͸·ͩͳ͍
    2014೥ USENIXͰSREconͷॳ։࠵
    ”৴པੑ޻ֶ”ͷΑ͏ͳֶज़ྖҬͱͯ͠ͷ੒ख़౓߹͍͸·ͩઙ͍
    طଘͷߴ৴པԽख๏ͱͷؔ܎ੑΛߟ͑Δ

    View Slide

  15. 15
    1. ϓϩτίϧʹ

    جͮࣗ͘ಈ੍ޚ
    2. ؅ཧऀʹΑΓએݴ͞Εͨ๬·͠
    ͍ঢ়ଶʹ௥ै͢Δࣗಈ੍ޚ
    3. ΦϖϨʔλʔʹΑΔखಈ੍ޚ
    Ϋϥ΢υͷ଱ো֐ੑͷͨΊͷ֊૚Ϟσϧʢಠࣗʣ
    • ίϯϙʔωϯτ΍௨৴ϨϕϧͰͷ

    ނো΍ྼԽରԠ


    • ఻ૹ੍ޚɾܦ࿏੍ޚϓϩτίϧɺ

    ෼ࢄ߹ҙΞϧΰϦζϜͳͲ
    • ܭࢉػϦιʔεͷ

    ల։ɾࣗಈ৳ॖɾ؅ཧ


    • Borg, KubernetesͳͲͷ

    ΦʔέετϨʔλʔ
    • ৴པੑͷ໨ඪ஋Λຬͨ͢Α͏ʹ

    ো֐ʹରͯ͠खಈͰରԠ


    • ༧๷ɾ༧ଌɾݕ஌ɾݪҼ਍அɾ

    ؇࿨ɾࣄޙ෼ੳɾम෮
    ʢϑΥʔϧττϨϥϯεʣ
    ࡢࠓͷٕज़Λ౿·͑ͨ3૚ΦχΦϯϞσϧɻSREͰ͸3ͷ૚ʹϑΥʔΧε
    Service-level
    Component-level
    System-level

    View Slide

  16. 16
    ΦϖϨʔλʔʹΑΔखಈ੍ޚʹཁ͢Δ࣌ؒ
    ※1 The VOID Report 2021 https://www.thevoid.community/report
    599૊৫ͷ1,818ͷো֐ϨϙʔτʹΑΔͱɺো֐ͷ൒਺Ҏ্͸2࣌ؒҎ಺ʹղܾ
    ※1
    ճ෮Λ୹ॖ͢Δ


    ༨஍͸े෼ʹ͋Δ

    View Slide

  17. 17
    2. ؅ཧऀʹΑΓએݴ͞Εͨ๬·͠
    ͍ঢ়ଶʹ௥ै͢Δࣗಈ੍ޚ
    3. ΦϖϨʔλʔʹΑΔखಈ੍ޚ
    Ϋϥ΢υͷ଱ো֐ੑͷͨΊͷ֊૚Ϟσϧʢಠࣗʣ
    AIʹΑΔࣗಈԽʢAIOpsʣ
    • ίϯϙʔωϯτ΍௨৴ϨϕϧͰͷ

    ނো΍ྼԽରԠ


    • ఻ૹ੍ޚɾܦ࿏੍ޚϓϩτίϧɺ

    ෼ࢄ߹ҙΞϧΰϦζϜͳͲ
    • ܭࢉػϦιʔεͷ

    ల։ɾࣗಈ৳ॖɾ؅ཧ


    • Borg, KubernetesͳͲͷ

    ΦʔέετϨʔλʔ
    • ৴པੑͷ໨ඪ஋Λຬͨ͢Α͏ʹ

    ো֐ʹରͯ͠खಈͰରԠ


    • ༧๷ɾ༧ଌɾݕ஌ɾݪҼ਍அɾ

    ؇࿨ɾࣄޙ෼ੳɾम෮
    ো֐ͷػߏͷ೺Ѳ͕೉͍ͨ͠Ίɺσʔλۦಈͷ
    ֶशʹΑΔࣗಈԽ͕ݚڀ͞Ε͍ͯΔ
    1. ϓϩτίϧʹ

    جͮࣗ͘ಈ੍ޚ
    ʢϑΥʔϧττϨϥϯεʣ

    View Slide

  18. 18
    ɾITΦϖϨʔλ͸खಈͰ໘౗ͳ؅ཧ࡞ۀ΍ೝ஌ෛՙͷߴ͍࡞ۀ͕ཁٻ͞ΕΔ


    ɾো֐ͷݕ஌΍ݪҼͷ਍அ


    ɾෛՙʹԠͨ͡εέʔϧΞ΢τɾεέʔϧΠϯ


    ɾΞϥʔςΟϯάͷ؅ཧɺΠϯγσϯτରԠ
    AIOps (Artificial Intelligence for IT Operations)
    [Notaro ’20]: Notaro, P, Jorge C, and Michael G. "A Systematic Mapping Study in AIOps.” ICSOC. Springer, Cham, 2020.
    [Dang’19]: Dang, Y, Qingwei L, and Peng H. "AIOps: Real-World Challenges and Research Innovations." ICSE-Companion. IEEE, 2019.
    ɾGartnerʹΑΓ2017೥ʹఏএ͞Εͨ ʢAlgorithmic IT Operationsઆ΋͋Δʣ
    ※1 https://blogs.gartner.com/andrew-lerner/2017/08/09/aiops-platforms/
    ※1
    ɾITαʔϏεͷ؅ཧͱվળʹɺ౷ܭղੳ΍ػցֶशΛ͸͡Ίͱ͢ΔAIʢਓ޻஌
    ೳʣٕज़Λద༻͢ΔऔΓ૊Έͷ૯শ

    View Slide

  19. 19
    SREͱAIOpsͷؔ܎
    ɾ׬શͳ৴པੑΛ໨ࢦ͞ͳ͍ϙϦγʔʹ͸ɺσʔλۦಈܕͷAI͕΋ͭ
    ՄṩੑΛ৫ΓࠐΈ΍͍͢


    ɾ׬શͳ৴པੑΛ໨ࢦ͢৔߹ɺϒϥοΫϘοΫεͰ͋ΔAIΛ৴͡ΒΕͳ͍

    View Slide

  20. 20
    ɾ1980೥୅ޙ൒ʹ͸ɺωοτϫʔΫ؅ཧʹɺ஌ࣝϕʔεAI΍χϡʔϥϧωο
    τϕʔεAIΛԠ༻͢ΔՄೳੑ͕ٞ࿦͞Ε͍ͯΔ
    ৘ใγεςϜͷӡ༻ʹAIΛԠ༻͢ΔىݯΛ୳Δ
    [Cebulka 1989]: Cebulka KD, et al., Applications of arti
    fi
    cial intelligence for meeting network management challenges in the 1990s, IEEE GLOBECOM 1989.
    ɾಛఆͷαʔϏεΛαϙʔτ͢ΔͨΊͷωοτϫʔΫͷॳظઃܭ


    ɾηϯτϥϧΦϑΟεؒͷઓज़తͳઃඋܭը


    ɾεΠον͔Βͷϝοηʔδͷ؂ࢹͱ਍அ
    [Notaro 2021]: Notaro P, et al., A Survey of AIOps Methods for Failure Management. ACM TIST, 2021.
    ɾ1990೥୅ॳ಄͔ΒΦϯϥΠϯͷιϑτ΢ΣΞ΍ϋʔυ΢ΣΞͷނো༧஌
    Ϟσϧ͕͍͔ͭ͘ఏҊ͞Ε͍ͯΔɽͦͷଞͷނো๷ࢭํ๏ͳͲ΋ಉ࣌ظ
    [Cebulka 1989]
    [Notaro 2021]

    View Slide

  21. 21
    ݱ୅ʹ͓͚ΔAIOpsͷߩݙྖҬ
    [Notaro ’20]: Notaro, P, Jorge C, and Michael G. "A Systematic Mapping Study in AIOps.” ICSOC. Springer, Cham, 2020.
    [Notaro ’20]: Fig.2 Taxonomy of AIOps as observed in the identified contributions

    ΑΓసࡌ
    ো֐؅ཧʹؔ͢Δݚڀ
    Ϧιʔεͷׂ౰ͳͲͷ

    ࠷దԽʹؔ͢Δݚڀ

    View Slide

  22. 22
    AIOpsͷݚڀྖҬ͝ͱͷ࿦จ਺
    [Notaro+, ICSOC2020] Notaro, P, Jorge C, and Michael G. "A Systematic Mapping Study in AIOps
    ɾAIOpsؔ࿈ͷ࿦จ਺ɿ670ʢ2020೥࣌఺ʣ


    ɾ670݅ͷ62.1%͕Failure Managementʢো֐؅ཧʣʹؔ࿈͍ͯ͠Δ


    ɾো֐༧ଌʢ26.4ˋʣো֐ݕग़ʢ33.7ˋʣݪҼ෼ੳʢ26.7ˋʣ
    ࿦จ਺͸૿Ճ܏޲
    [Notaro+, ICSOC2020]

    View Slide

  23. 23
    AIOpsʹ͓͚Δো֐؅ཧͷݚڀ
    ༧ଌ

    ༧๷
    ݪҼ਍அ ؇࿨ ࣄޙ෼ੳ
    ݕ஌ म෮
    ͍ͣΕ΋ΦϖϨʔλʔͷ


    ܦݧ΍௚ײʹґଘ͢ΔλεΫ
    ݚڀ࿦จ͕ଟ͍λεΫ
    ௚઀తͳ൑அ΍ૢ࡞ΑΓ͸


    ิॿతͳ৘ใࢧԉͷͨΊͷݚڀ͕


    ࢧ഑త
    [Notaro+, TIST2021] A Survey of AIOps Methods for Failure Management


    [Soldani+, CSUR2022] Anomaly Detection and Failure Root Cause Analysis in (Micro)Service-Based Cloud Applications: A Survey
    ݹయతػցֶश
    ਂ૚ֶश


    CNN/RNN


    /LSTM/GNN…
    ౷ܭతҼՌਪ࿦
    ౷ܭతػցֶश
    ϝτϦΫε/ϩά/τϨʔε/Πϕϯτ/Ξ
    ϥʔτͳͲͷӡ༻σʔλΛಛ௃ྔͱ͢Δ

    View Slide

  24. 24
    ΑΓৄࡉͳAIOpsͷݚڀࣄྫ
    https://speakerdeck.com/yuukit/sre-next-2022
    AIOpsݚڀ࿥ʕSREͷͨΊͷγεςϜো֐ͷࣗಈݪҼ਍அ
    SRE NEXT 2022


    View Slide

  25. 25
    ɾ ϋʔυ΢ΣΞ͔Βιϑτ΢ΣΞɺΫϥ΢υ΁ͱ৘ใγεςϜͷܗଶ
    ͕มભ͢ΔʹͭΕͯɺ৴པੑͷΞϓϩʔν͕αʔϏεࢦ޲΁มભɻ


    ɾSRE͸ɺΦϖϨʔγϣϯΛࣗಈԽ্ͨ͠Ͱɺ৴པੑࢦඪʹԼݶΛઃ
    ఆ͠ɺมߋ଎౓ΛߴΊΔɺো֐ڐ༰ΞϓϩʔνͰ͋Δɻ


    ɾ ଱ো֐ੑͷͨΊͷ֊૚Ϟσϧͷ͏ͪɺ࠷֎֪ͷखಈ੍ޚʹରͯ͠ɺ
    AIʹΑΔࣗಈԽʢAIOpsʣ͕ݚڀ͞Ε͍ͯΔɻ


    ɾSREͷݪଇʹ͸ɺAI͕΋ͭՄṩੑΛ৫ΓࠐΈ΍͍͢ͱظ଴͢Δ


    ɾݱࡏ͸ิॿతͳ৘ใࢧԉʹཹ·Δ
    ·ͱΊɿ1. Ϋϥ΢υʹ͓͚Δ৴པੑΤϯδχΞϦϯά

    View Slide

  26. 1. Ϋϥ΢υʹ͓͚Δ৴པੑΤϯδχΞϦϯά


    2. AI࣌୅ʹ͓͚Δ৴པੑΤϯδχΞϦϯάͷະདྷ


    3. AIͱͷڠಇʹΑΔ৴པੑΤϯδχΞϦϯάͷݕ౼


    4. ͓ΘΓʹ
    26
    ΞδΣϯμ
    ݱࡏɺͲ͏ͳͬͯ

    ͍Δͷ͔
    20೥ઌͷະདྷͰ

    Ͳ͏͋Γ͍͔ͨ
    ະདྷͱݱࡏͷࠩΛ

    ຒΊΔಓے͸ͳʹ͔

    View Slide

  27. 27
    དྷͨΔ΂͖AI࣌୅΁޲͚ͯະདྷΛߟ͑Δ
    ɾ৴པੑ͚ͩΛऔΓ্͛ͯɺະདྷΛޠΔͷ͸೉͍͠


    ɾ৴པੑΛཁٻ͢ΔਓʑͱΞϓϦέʔγϣϯͷ͋Γํ͔Βߟ͑Δ
    ະདྷͷ͋Δ࣌఺ͷئ๬


    ͔Β࢝ΊΔ
    2040s
    2022(ݱࡏ) 2045
    ٕज़త

    ಛҟ఺
    όοΫΩϟεςΟϯάͰ


    ະདྷ͔Βݱࡏ΁Ḫߦ
    ϑΣʔζ2
    ϑΣʔζ1
    ϑΣʔζ3
    2045೥ͷγϯΪϡϥ
    ϦςΟൃੜΛԾఆ

    View Slide

  28. AIʹਓؒͷ࢓ࣄ͕ୣΘΕΔ͚ͩͷ


    ະདྷ؍͸͓΋͠Ζ͘ͳ͍
    ਓؒಉ࢜ͷ૬ޓཧղ͕೉͍͜͠ͱ͔Β


    AI͕ਓؒͷજࡏతࢥߟΛཧղ͢Δ͜ͱ΋༰қͰ͸ͳ͍͸ͣ

    View Slide


  29. ສਓ͕ࣗΒʹ࠷దԽ͞ΕͨΞϓϦέʔγϣϯΛ

    AIͱͷର࿩Λ௨ͯࣗ͡༝ʹ੡࡞Մೳͳ࣌୅
    2040೥୅
    ηϧϑΫϥϑτʢSelf Craftingʣ
    AIʹΑΔࣗಈԽΛಥ͖٧ΊΔͱɺٯઆతʹਓؒ͸૑଄తʹͳΔ

    View Slide

  30. 30
    ݱࡏʢ2022೥ʣͷΞϓϦέʔγϣϯ։ൃ
    Ϋϥ΢υ
    Πϯλʔωοτ
    ඪ४Խ͞Εͨ

    ػೳͱ

    ΠϯλʔϑΣΠε
    ։ൃऀ
    ඪ४Խࢦ޲ͷੈք
    ཁૉٕज़


    ΞϓϦέʔγϣϯ։ൃऀ͸Ϋϥ΢υ্ͷσʔλߏ଄΍
    ܭࢉϢχοτ΋ඪ४Խ͞Εͨ΋ͷΛར༻
    ΞϓϦέʔγϣϯ

    ͷܧଓߋ৽
    ඪ४Խ͞Εͨ

    ௨৴ϓϩτίϧͱAPI
    αʔϏε


    ଟ਺ͷར༻ऀʹڞ௨ʹΈΒΕΔજࡏతͳχʔζΛൃݟ
    ͠ɺඪ४Խ͞ΕͨػೳͱΠϯλʔϑΣΠεΛఏڙ
    ඪ४Խ͞Εͨ

    σʔλߏ଄
    ඪ४Խ͞Εͨ

    ܭࢉϢχοτ

    View Slide

  31. 31
    ະདྷʢ2040sʣͷΞϓϦέʔγϣϯ։ൃΫϥϑτ
    ݸผԽࢦ޲ͷੈք
    ۭؒͱͷΠϯλϥΫγϣϯ
    ʢXRʣ
    Πϯλʔωοτ Ϋϥ΢υ
    ֶशܕ௨৴

    ϓϩτίϧ
    [Kraska+, SIGMOD2018] The Case for Learned Index Structures
    ֶशܕ


    σʔλߏ଄
    ηϧϑ


    Ϋϥϑτ
    AI
    AI
    AI
    αʔϏε


    ར༻ऀͷજࡏతͳχʔζʹࢸΔ·
    ͰɺAIͱར༻ऀ͕ػೳͱΠϯλʔ
    ϑΣΠεΛର࿩తɾମݧతʹ࣮૷
    ཁૉٕज़


    ΞϓϦέʔγϣϯͷཁٻ

    ʹ͋Θֶͤͨशܕͷݸผ

    ࠷దԽ
    ૬ޓ࡞༻ʹ
    ΑΔਐԽ
    [Ma+, EuroSys2022] Multi-Objective Congestion Control
    [Kraska+, SIGMOD2018]
    [Ma+, EuroSys2022]
    e2eͰར༻ऀͷཁٻʹԠͯ͡
    ࠷దͳϓϩάϥϜͱϓϩτί
    ϧ͕ಈత͔ͭదԠతʹมԽ

    View Slide

  32. 32
    ηϧϑΫϥϑτͷੈքʹ͓͚Δ৴པੑ
    ༗ݶͷڞ༗ࢿݯ
    AI
    AI
    AI Πϯλʔωοτ Ϋϥ΢υ
    ࢿݯͷཁٻ
    ద੾ͳ

    ৴པੑ໨ඪΛ

    ܾఆ͢Δඞཁ͕

    ͋Δ
    ಛఆͷηϧϑΫϥϑτΞϓϦͷ৴པੑ໨ඪΛ100%ʹ͚ۙͮΔ΄Ͳ…


    ࢿݯফඅˢ: ৑௕ੑ΍Ԡ౴଎౓ΛߴΊΔ΄Ͳɺଟ͘ͷࢿݯΛফඅɻଞऀͷຬ଍
    ౓ΛԼ͛ΔՄೳੑ༗Γɻ


    มߋ଎౓ˣ: ηϧϑΫϥϑτʹΑΓมߋ͢Δ΄Ͳɺมߋޙͷӡ༻σʔλ͕଍Γ
    ͳ͘ͳΓɺো֐ͷ༧ଌɾ༧๷ਫ਼౓ͳͲ͕௿Լ
    [Mogul+, HotOS2019] Nines are not enough: Meaningful metrics for clouds
    [Mogul+, HotOS2019]

    View Slide

  33. AI͸


    ۉߧ఺ͱͯ͠ͷ৴པੑ໨ඪΛ


    ద੾ʹܾఆՄೳ͔ʁ

    View Slide

  34. 34
    ਓ͕ؒAIʹ໋ྩʢએݴʣ͢Δ͜ͱͷݶք
    [BEATLESS]: ௕୩ හ࢘, B E A T L E S S, ݄ץχϡʔλΠϓ, ֯઒ॻళ, 2012೥.
    ʪ໋ྩ͞ΕΔਓ޻஌ೳͷଆʹཱͬͯɺߟ͑ͯΈ͍ͯͩ͘͞ɻ໋ྩ
    ͸ɺᐆດͳᶸҙຯᶹͷ૊Έ߹Θͤͱͯ͠༩͑ΒΕɺͦͷᶸҙຯᶹղ
    ऍ΋·ͨɺ͢΂໋ͯྩΛ༩͑ΔਓؒʹѲΒΕ͍ͯ·͢ɻʜਓ޻஌ೳ
    ͸ɺͲ͜·Ͱ໋ྩऀͷݴ͏ʰద੾ͳʱղ౴Λग़ͤΔͷͰ͔͢ʁʫ
    ௒ߴ౓AI ʬώΪϯζʭ [BEATLESS] PHASE13ʮBEATLESSʯΑΓ

    Ұ෦จࣈ৭Λมߋͯ͠Ҿ༻
    ʪʜ͔ͩΒɺࢲʹ͸ʰ৴͡Δʱ͜ͱ͸Ͱ͖·ͤΜɻͦ͏͍͏ಓ۩ͷ
    ڍಈΛਖ਼֬ʹίϯτϩʔϧ͍ͨ͠ͷͰ͋Ε͹ɺᐆດ͞ͷͳ͍൑அج४
    Λ͍ͩ͘͞ʫ
    ௒ߴ౓AI ʬώΪϯζʭ [BEATLESS] LAST PHASE ʮIMAGE AND LIFEʯΑΓ

    Ұ෦จࣈ৭Λมߋͯ͠Ҿ༻
    ʮΘͨ͠͸ɺΦʔφʔͰ͋ΔΞϥτ͞ΜͷͨΊʹࢿݯΛ഑෼͢ΔίϯτϩʔϥʔͰ͢ɻɹɹ
    ᶸະདྷΛσβΠϯᶹͯ͠ཉ͍͠ͱ͸ɺ഑෼ͷͨΊͷج४఺Λઃఆͯ͠ཉ͍͠ͱ͍͏͜ͱͰ͢ʯ
    hIEʬϨΠγΞʭ[BEATLESS] PHASE10ʮPLUS ONEʯΑΓҰ෦จࣈ৭Λมߋͯ͠Ҿ༻

    View Slide

  35. 35
    ʰ2001 ೥Ӊ஦ͷཱྀʱHAL 9000
    18ষ SREͷͨΊͷػցֶशೖ໳ ͔ΒͷҾ༻
    ” ͨͬͨࠓɺAE35Ϣχοτͷো֐Λݕग़͠·ͨ͠ɻ
    ࢲ͸72࣌ؒҎ಺ʹ100%ͷ֬཰Ͱػೳఀࢭ͠·͢ɻ”
    ― HAL 9000ɺʰ2001 ೥Ӊ஦ͷཱྀʱ
    “͜ͷөը͕ඳ͘ະདྷΛઌݟͷ໌Λ΋ͬͯߏ૝ͨ͠ͷ͸Ξʔ
    αʔɾCɾΫϥʔΫ(Arthur C. Clarke)ͰɺγεςϜͱϋʔυ΢Σ
    Ξͷো֐ൃੜΛԿ࣌ؒ΋લʹ༧ଌͰ͖Δ׬શࣗಈԽαʔϏεͱ
    AI Λ૊Έ߹Θͤ·ͨ͠ɻHAL 9000 ͸ɺཱࣗͨࣗ͠ݾௐ੔ܕͷ
    ܽ఺͕ͳ͍ػցͱ͍͏ਓྨͷເ(͋Δ͍͸ѱເ)Ͱ͋Γɺਓؒʹ
    Αͬͯఆٛ͞Εͨ໨ඪΛୡ੒͢ΔͨΊʹɺӉ஦ધͷ৐һͱϛο
    γϣϯͷ྆ํʹไ࢓͠·͢ɻ”
    David N. Blank-Edelmanɹฤɺࢁޱ ೳ᫫ɹ؂༁ɺ౉ᬒ ྃհɹ༁, SREͷ୳ٻʕʕ༷ʑͳاۀʹ͓͚ΔαΠτϦϥΠΞϏϦςΟΤϯδχΞϦ
    ϯάͷಋೖͱ࣮ફ, ΦϥΠϦʔɾδϟύϯ, 2021೥.

    View Slide

  36. ར༻ऀͱAI͕ɺར༻ऀʹͱͬͯͷ࠷దͳۉߧ఺Λର࿩తʹ୳Δ
    ద੾ͳ৴པੑ͕ෆ໌ → ର࿩తΞϓϩʔν
    ৴པੑɺίετɺมߋ଎౓ͳͲͷ

    ֤มྔͷ഑෼ͷͨΊͷ࠷దۉߧ఺
    ར༻ऀ͕͋Δ΂͖ঢ়ଶΛख़ߟͯ͠એݴͤͣʹɺ

    ൃݟతʹղΛ୳ࡧՄೳ
    Ұ୴ղ͕ऩଋͯ͠΋ɺ


    ঢ়گͷมԽʹԠͯ͡ɺ


    ࠶౓ର࿩తऩଋΛߦ͏
    36

    View Slide

  37. 37
    ର࿩తΞϓϩʔνʹΑΔௐ੔ͷྫ
    Ͱ͖Δ͚ͩམͪͳͯ͘ɺಈ࡞΋ܰ͘͠
    ͯ΄͍͠
    ※Ի੠΍ςΩετʹΑΔର࿩Ҏ֎ͷ਎ମత
    ͳૢ࡞ʹΑΔର࿩΋͋Γ͑Δ
    ৴པੑΛݱ࣮తͳϨϕϧͰߴΊΔͱͳ
    Δͱɺۚમίετ͸ʓʓԁͰ͢
    AI
    ͍΍͍΍ɺߴ͗͢ΔΑ
    ˛˛ػೳͷ৴པੑ໨ඪΛ99.999%͔Β
    99.9%ʹ௿Լͤ͞Ε͹ɺίετ͸˘˘
    ԁ·Ͱ҆͘ͳΓ·͢
    AI
    े෼҆͘ͳ͚ͬͨͲɺ৴པੑ͕མͪΔ
    ͷ͸ෆ҆ͳΜ͚ͩͲ
    Ͱ͸ɺࢼ͠ʹɺࠓ͔Β10෼͚ͩ˛˛
    ػೳΛྼԽͤ͞ΔͨΊɺ৴པੑʹෆຬ
    ͕͋Δ͔൑அ͍ͯͩ͘͠͞
    AI
    ΍ͬͺΓ͜Ε͚ͩΤϥʔ͕ͰΔͱෆศ
    ͩͶ
    Ͱ͸ɺ৴པੑ໨ඪΛ99.99%ʹͯ͠ɺ
    ίετ͸✕✕ԁͰ͸Ͳ͏Ͱ͔͢ʁ
    AI
    ʮମݧతʯͳ

    ௐ੔ϓϩηε

    View Slide

  38. ݱ୅ͱະདྷͷ؍఺ผͷൺֱ
    HCIͷมԽ ಛ௃ ΞϓϦ

    έʔγϣϯ
    ཁૉٕज़ ৴པੑ
    ݱ୅

    2022೥
    ฏ໘ͷσΟε
    ϓϨΠΛհ͠
    ͨΠϯλϥΫ
    γϣϯ
    ඪ४Խࢦ޲

    αʔϏεࢤ
    ޲
    ઐ໳ࣄۀऀ͕λʔ
    ήοτͱͳΔඪ४
    తͳར༻ऀΛ૝ఆ
    ͯ͠ػೳΛ։ൃ
    ඪ४Խ͞Εͨ
    σʔλߏ଄ͱ
    ϓϩτίϧ
    ར༻ऀͷߦಈʹؔ
    ͢Δܭଌࢦඪͷ౷
    ܭతཁ໿ʹΑΓܾ

    ະདྷ

    2040s
    Ծ૝ݱ࣮ɾ֦
    ுݱ࣮ɾෳ߹
    ݱ࣮ʹର͢Δ
    ޒײΛ௨ͨ͠
    ۭؒͱͷΠϯ
    λϥΫγϣϯ
    ݸผԽࢦ޲

    Ϋϥϑτࢤ
    ޲
    ར༻ऀ͕ࣗ෼ͷᅂ
    ޷ʹ͋Θͤͨ࠷ద
    ͳػೳΛࣗΒ੡࡞
    AIͱͷର࿩ʹΑΔ
    ࣗಈϓϩάϥϛϯ
    ά
    ΞϓϦʹ͋Θ
    ֶͤͨशܕͷ
    σʔλߏ଄ͱ
    ϓϩτίϧ
    ৴པੑͱͦͷଞͷ
    جຊมྔͱͷۉߧ
    ఺ΛAIͱର࿩త͔
    ͭମݧతʹܾఆ
    38

    View Slide

  39. 39
    ɾ2040೥୅ɿݸผԽࢦ޲ΞϓϦέʔγϣϯΛສਓ͕ࣗ෼ͷͨΊʹࣗ෼Ͱ
    ੡࡞ʢηϧϑΫϥϑτʣ͢Δ࣌୅ʹͳͬͯ΄͍͠


    ɾࢿݯ͸༗ݶͰ͋ΔͨΊɺݸਓͷޮ༻ΛແݶʹߴΊΔ͜ͱ͸Ͱ͖ͳ͍


    ɾར༻ऀ͕৴པੑͱͦͷଞͷجຊతͳมྔؒͷۉߧ఺Λௐ੔͢Δඞཁ͕
    ͋Δ


    ɾਓ͕ؒAIʹۉߧ఺Λ༧Ί໋ྩʢએݴʣ͓ͯ͘͜͠ͱ͸೉͍͠


    ɾద੾ͳ৴པੑΛɺར༻ऀݸผʹɺAIͱର࿩త͔ͭମݧతʹܾఆ͢Δ
    ·ͱΊɿ2. AI࣌୅ʹ͓͚Δ৴པੑΤϯδχΞϦϯάͷະདྷ

    View Slide

  40. 1. Ϋϥ΢υʹ͓͚Δ৴པੑΤϯδχΞϦϯά


    2. AI࣌୅ʹ͓͚Δ৴པੑΤϯδχΞϦϯάͷະདྷ


    3. AIͱͷڠಇʹΑΔ৴པੑΤϯδχΞϦϯάͷݕ౼


    4. ͓ΘΓʹ
    40
    ΞδΣϯμ
    ݱࡏɺͲ͏ͳͬͯ

    ͍Δͷ͔
    20೥ઌͷະདྷͰ

    Ͳ͏͋Γ͍͔ͨ
    ະདྷͱݱࡏͷࠩΛ

    ຒΊΔಓے͸ͳʹ͔

    View Slide

  41. 41
    ݱ୅͔Β2040೥୅·Ͱͷ৴པੑΤϯδχΞϦϯά
    2040s
    2022
    2017 2030s 2045
    ٕज़త

    ಛҟ఺
    Gartner͕

    AIOpsఏএ
    ٕज़ऀ͕

    AIͱڠಇ
    AIʹΑΔ


    ো֐ͷࣗ཯ରԠ
    ٕज़ऀ͕

    ৴པੑΛ੍ޚ
    ৴པੑ໨ඪ͸
    ਓ͕ؒએݴɻ


    AIʹΑΔݕ஌
    ΍਍அͷݶఆ
    తͳิॿ
    ݱࡏ
    e2eͰར༻ऀͷཁٻ
    ʹԠͯ͡ϓϩάϥϜ
    ͱϓϩτίϧ͕ಈత
    ͔ͭదԠతʹਐԽ
    ٕज़ऀ͕γεςϜ
    ΞʔΩςΫνϟΛઃ
    ܭ͠ɺAI͕Ϟδϡʔ
    ϧΛ࣮૷ɾ࿈݁
    ٕज़ऀͱAIʹΑΔ
    ର࿩తͳো֐༧๷
    ΍ճ෮ɻ


    ӡ༻σʔλͷΦʔ
    ϓϯԽ͕ਐΉ
    ηϧϑ

    Ϋϥϑτ
    ར༻ऀ͕

    ৴པੑΛ੍ޚ
    ϑΣʔζ2
    ϑΣʔζ3
    ϑΣʔζ̍ ৴པੑ໨ඪ͸αʔ
    Ϗεࣄۀऀ͕ܾఆ

    View Slide

  42. ৴པੑ໨ඪ͸
    ਓ͕ؒએݴɻ


    AIʹΑΔݕ஌
    ΍਍அͷݶఆ
    తͳิॿ
    ٕज़ऀ͕γεςϜ
    ΞʔΩςΫνϟΛઃ
    ܭ͠ɺAI͕Ϟδϡʔ
    ϧΛ࣮૷ɾ࿈݁
    ٕज़ऀͱAIʹΑΔ
    ର࿩తͳো֐༧๷
    ΍ճ෮ɻ


    ӡ༻σʔλͷΦʔ
    ϓϯԽ͕ਐΉ
    42
    ݱ୅͔Β2040೥୅·Ͱͷ৴པੑΤϯδχΞϦϯά
    2040s
    2022
    2017 2030s 2045
    ٕज़త

    ಛҟ఺
    Gartner͕

    AIOpsఏএ
    ٕज़ऀ͕

    AIͱڠಇ
    AIʹΑΔ


    ো֐ͷࣗ཯ରԠ
    ٕज़ऀ͕

    ৴པੑΛ੍ޚ
    ݱࡏ
    ηϧϑ

    Ϋϥϑτ
    ར༻ऀ͕

    ৴པੑΛ੍ޚ
    ৴པੑ໨ඪ͸αʔ
    Ϗεࣄۀऀ͕ܾఆ
    ϑΣʔζ3
    ࠓճͷݕ౼ൣғ
    ϑΣʔζ2
    ϑΣʔζ̍
    e2eͰར༻ऀͷཁٻ
    ʹԠͯ͡ϓϩάϥϜ
    ͱϓϩτίϧ͕ಈత
    ͔ͭదԠతʹਐԽ

    View Slide

  43. 43
    ਂ૚ֶश؍఺ͰͷAIOpsͷ໰୊ҙࣝ
    ݱࡏͷAIOpsͰ͸ɺݸผͷγεςϜ͝ͱʹہॴతʹֶशϞσϧΛ࡞੒
    ɾଟ਺ͷγεςϜͷσʔλ͔Βֶश͠ɺେҬతͳֶशϞσϧΛ࡞੒͢Ε͹ɺଞ
    ෼໺ʢCVɺNLPʣͷΑ͏ͳݦஶͳ੒Ռ͕ಘΒΕΔͷͰ͸ͳ͍͔


    ɾ͔͠͠ɺαʔϏεࣄۀऀ͸ɺϓϥΠόγʔอޢͷͨΊɺӡ༻σʔλͷެ։ʹ
    ੵۃతͰͳ͍

    γεςϜX
    ίϯϙʔωϯτCX
    ϝτϦΫεCXM1
    ϝτϦΫεCXM2
    ଟมྔ࣌ܥྻͷ


    ֶशϞσϧ
    ɾ


    ɾ


    ɾ

    View Slide

  44. 44
    ϑΣʔζ1ʢٕज़ऀͱAIͷڠಇʣʹ޲͚ͯͷ՝୊ͱཁ݅
    1. ਖ਼ৗظ͕ؒࢧ഑తͰ͋ΓɺҟৗΛֶश͢ΔͨΊͷσʔλ͕ෆ଍


    ↪ ཁ݅ᶃ ނҙʹҟৗΛൃੜͤ͞ɺҟৗΛֶशՄೳ


    2. ֶशϞσϧ͕ఏࣔ͢Δ༧ଌͷࠜڌ͕ෆ໌


    ↪ ཁ݅ᶄ ༧ଌࠜڌΛΦϖϨʔλʔ͕ཧղՄೳͳݴޠͰఏࣔՄೳ
    [Soldani+, CSUR2022] Anomaly Detection and Failure Root Cause Analysis in (Micro)Service-Based Cloud Applications: A Survey
    [Soldani+, CSUR2022]
    લఏɿগ਺ͷγεςϜͷσʔλ͔ΒͷΈֶश͢Δ
    ҟৗΛڭ͑Δ
    ڭΘͬͨ݁ՌΛఏࣔ
    ڭ͑ͨ͜ͱͷֶ
    श౓߹͍Λ֬ೝ
    AI

    View Slide

  45. Interactive AIOps
    ΦϖϨʔλʔͱAI͕ର࿩తʹର৅γεςϜͷಛ௃Λ

    ڠಇֶश͢Δίϯηϓτ

    View Slide

  46. 46
    ཁ݅ᶃɹ࣮ݧՄೳੑʢExperimentabilityʣ
    AI
    ऑ఺ͷൃݟͱֶश
    Chaos Engineering ͔Βண૝
    γεςϜతͳऑ఺Λൃݟ͢ΔͨΊ

    ʹߦ͏࣮ݧͷԁ׈Խ
    [Rosenthal+, 2020] Chaos Engineering: System Resiliency in Practice
    [Rosenthal+, 2020]
    1. ΦϖϨʔλʔ͸৘ใγεςϜʹނ
    োΛ஫ೖͨ͠ΓෛՙΛ૿ݮͤ͞Δ


    2. ͦͷࡍʹ؍ଌ͞ΕͨσʔλΛAI͕
    ֶश͢Δ


    3. 1ͱ2ΛҟৗύλʔϯΛม͑ͳ͕Β
    ܁Γฦ͢
    Operator
    ҟৗΛڭ͑Δ
    ֓೦ͷ֦ு

    View Slide

  47. 47
    ཁ݅ᶄɹղऍੑʢExplainabilityʣ
    ղऍՄೳͳAIʢXAIʣ
    AI
    Operator
    ※̍ https://speakerdeck.com/tsurubee/a-survey-on-interpretable-machine-learning-and-its-application-for-system-operation
    ※̍
    ڭΘͬͨ݁Ռ


    Λఏࣔ
    1. ΦϖϨʔλʔ͸ཁ݅ᶃͰͷҟৗͱ
    ྨࣅͷҟৗΛ࠶ݱ


    2. AI͸ҟৗʹରͯ͠ɺ༧ଌ΍ݪҼΛ
    ͦͷࠜڌʢد༩ͨ͠ಛ௃ྔʣͱͱ
    ΋ʹฦ͢
    ਓؒʹཧղՄೳͳݴ༿Ͱઆ໌·
    ͨ͸ఏࣔ͢ΔೳྗΛ΋ͭAI
    [Adadi+, Access2018] Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
    [Adadi+, Access2018]
    ༧ଌ݁Ռͷ

    ਖ਼౰Խ
    ਓؒͱϞσϧؒ


    Ͱܧଓվળ
    σόοά
    ৽ͨͳൃݟ
    [Adadi+, Access2018]


    ΑΓFIGURE 5ͷҾ༻

    View Slide

  48. 48
    ΑΓൃలతͳAIͱͷڠಇͷՄೳੑ
    γεςϜֶؒशੑ


    (Intersystem Learnability)
    ܇࿅Մೳੑ (Trainability)
    AI
    AI͕ఏࣔ͢Δ܇࿅ϓϩάϥϜ

    Λ༻͍ͯΦϖϨʔλʔ͕ো֐
    ରԠ܇࿅
    ͋ΔγεςϜ͕

    ଞγεςϜʢࣗݾͷաڈؚΉʣ


    ͷֶश಺༰͔Βֶ΂Δ
    AI AI
    సҠ
    AI
    సҠֶशʹΑΔ


    - ֶशͷߴ଎Խ


    - ֎ૠੑΛ֫ಘ
    Operator
    ೳಈֶशʹΑΓաڈͷ
    σʔλͷϥϕϦϯάΛ
    ܇࿅ϓϩάϥϜʹ૊Έ
    ࠐΉͱͯ͠ఏࣔ
    Target Source
    [Pan+, TKDE2009] A Survey on Transfer Learning.
    [Pan+, TKDE2009]
    [Settles,2009] Active Learning Literature Survey
    [Settles,2009]

    View Slide

  49. 49
    ɾݱ୅͔Β2040೥୅ʢٕज़ಛҟ఺ؚΉʣ·ͰͷಓےΛ3ͭͷϑΣʔζ
    ʹ෼཭͠ɺࠓճ͸ɺٕज़ऀ͕AIͱڠಇՄೳͳϨϕϧΛݕ౼͢Δɻ


    ɾӡ༻σʔλΛ޿͘ೖखͰ͖ͳ੍͍໿ͷൣғͰ͸ɺҟৗͷσʔλΛࣗ
    Β࡞Γग़ֶ͠श͢Δඞཁ͕͋Δɻ


    ɾΦϖϨʔλʔͱAI͕ର࿩తʹγεςϜͷಛ௃Λڠಇֶश͢Δίϯη
    ϓτʮInteractive AIOpsʯΛఏএ͢Δɻ


    ɾର࿩ͷجຊܕ͸ɺ࣮ݧՄೳੑʢAIʹڭ͑ΔʣͱղऍੑʢAI͔Βઆ
    ໌ʣͰ͋Δɻ
    ·ͱΊɿ3. AIͱͷڠಇʹΑΔ৴པੑΤϯδχΞϦϯάͷݕ౼

    View Slide

  50. 1. Ϋϥ΢υʹ͓͚Δ৴པੑΤϯδχΞϦϯά


    2. AI࣌୅ʹ͓͚Δ৴པੑΤϯδχΞϦϯάͷະདྷ


    3. AIͱͷڠಇʹΑΔ৴པੑΤϯδχΞϦϯάͷݕ౼


    4. ͓ΘΓʹ
    50
    ΞδΣϯμ
    ݱࡏɺͲ͏ͳͬͯ

    ͍Δͷ͔
    20೥ઌͷະདྷͰ

    Ͳ͏͋Γ͍͔ͨ
    ະདྷͱݱࡏͷࠩΛ

    ຒΊΔಓے͸ͳʹ͔

    View Slide

  51. 51
    ຊߨԋશମͷ·ͱΊ
    ݱࡏ
    ɾSite Reliability Engineering͸ɺ৴པੑΛ੍ޚର৅ͱ͢Δɻ


    ɾAIOpsͷݚڀ͕׆ൃͰ͋Γͭͭ΋ɺิॿతͳ৘ใࢧԉʹཹ·Δɻ
    ະདྷ
    ɾ2040sɿඪ४Խɾએݴࢦ޲͔ΒݸผԽɾର࿩ࢦ޲ͷ࣌୅΁มભɻ


    ɾ৴པੑΛݸผͷۉߧ఺΁ɺར༻ऀ͕AIͱͷର࿩త͔ͭମݧతʹऩଋɻ
    ಓے
    ɾAIͱٕज़ऀͷڠಇ → AIʹΑΔো֐ͷࣗ཯ରԠ → ར༻ऀ͕৴པੑΛ੍ޚ


    ɾσʔλ͕ෆ଍͢ΔલఏͰ͸ɺҟৗΛࣗΒ࡞Γग़͍ͯ͘͠ඞཁ͕͋Δɻ
    ର࿩ͱମݧʢ࣮ݧʣʹΑΔڠಇతͳ৘ใγεςϜͷ੡࡞ͱ੍ޚɻ


    γεςϜͷجຊཁૉͰ͋Δ৴པੑʹ΋ٴͿɻ

    View Slide

  52. 52
    AIͷೳྗ͕޲্͢ΔʹͭΕͯɺٕज़ऀʹͱͬͯͷϒϥοΫϘοΫεͷ
    ൣғ͕େ͖͘ͳΔ
    ຊߏ૝ͷࠓޙͷݕ౼ࣄ߲
    [Bainbridge, Pergamon1983] Ironies of Automation. Analysis, Design and Evaluation of Man–machine Systems
    Ironies of Automation [Bainbridge, Pergamon1983]
    ੍ޚγεςϜ͕ߴ౓ʹͳΕ͹ͳΔ΄ͲɺਓؒͷΦϖϨʔλʔͷߩݙ͕
    ΑΓॏཁʹͳΔͱ͍͏ൽ೑
    Ͳ͜·ͰAIΛ৴͡Δͷ͔ɺAIࣗମͷ৴པੑʹͲ͏Ξϓϩʔν͢Δ͔

    View Slide

  53. 53
    ओͳࢀߟਤॻ

    View Slide

  54. ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠
    ڞಉͰͷٞ࿦ɾݚڀɺ

    σΟεΧογϣϯɺࢧԉͳͲ

    Λ͓଴͓ͪͯ͠Γ·͢

    View Slide