Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TSifter: マイクロサービスにおける性能異常の迅速な診断に向いた時系列データの次元削減手法 / TSifter in proceedings of IOTS2020

TSifter: マイクロサービスにおける性能異常の迅速な診断に向いた時系列データの次元削減手法 / TSifter in proceedings of IOTS2020

第13回情報処理学会インターネットと運用技術シンポジウムhttps://www.iot.ipsj.or.jp/symposium/iots2020-program/

Yuuki Tsubouchi (yuuk1)

December 03, 2020
Tweet

More Decks by Yuuki Tsubouchi (yuuk1)

Other Decks in Research

Transcript

  1. TSifter: ϚΠΫϩαʔϏεʹ͓͚Δੑೳҟৗͷ
    ਝ଎ͳ਍அʹ޲͍ͨ࣌ܥྻσʔλͷ࣍ݩ࡟ݮख๏
    ௶಺ ༎थʢ͘͞ΒΠϯλʔωοτɺژ౎େֶʣ
    ௽ా തจʢ͘͞ΒΠϯλʔωοτʣ
    ݹ઒ խେʢ͸ͯͳʣ
    ৘ใॲཧֶձ ୈ13ճΠϯλʔωοτͱӡ༻ٕज़γϯϙδ΢ϜʢIOTS2020ʣ
    2020೥12݄3೔

    View full-size slide

  2. 2
    1. ݚڀͷഎܠͱ໨త
    2. ҟৗͷݪҼ਍அʹ޲͍ͨϝτϦοΫͷ࣍ݩ࡟ݮख๏
    3. ࣮ݧͱධՁ
    4. ·ͱΊͱࠓޙͷల๬
    ໨࣍

    View full-size slide

  3. 1.
    ݚڀͷഎܠͱ໨త

    View full-size slide

  4. 4
    ϚΠΫϩαʔϏεߏ੒ͷීٴ
    ϞϊϦε ϚΠΫϩαʔϏε
    ػೳผͷ෼ࢄߏ੒΁
    มભ
    WebαʔϏεͷιϑτ΢ΣΞن໛͕૿େ͠ɺ։ൃऀ͕ιϑτ΢ΣΞΛ
    มߋ͢Δ͜ͱ͕೉͘͠ͳ͍ͬͯΔ

    View full-size slide

  5. 5
    ؂ࢹσʔλྔ
    ͷ૿େ
    ϚΠΫϩαʔϏεΛӡ༻͢Δࡍͷ໰୊ҙࣝ
    ґଘؔ܎ͷෳࡶੑ
    ιϑτ΢ΣΞͷ
    มߋස౓޲্
    γεςϜͷೝ஌ෛՙ͕ߴ·Δ
    ੑೳҟৗͷݪҼΛ਍அ͢ΔͨΊͷ࣌ؒΛཁ͢ΔΑ͏ʹͳΔ

    View full-size slide

  6. 6
    ੑೳҟৗΛ਍அ͢ΔͨΊͷطଘͷΞϓϩʔν
    ϝτϦοΫ
    ςΩετϩά
    ࣮ߦτϨʔε
    ๛෋ͳ৘ใΛ΋͕ͭϩάʹग़ྗ͞Εͳ͍΋ͷ΋
    ͋Δ
    ॲཧܦ࿏ͷล୯Ґͷεϧʔϓοτ΍࣮ߦ࣌ؒΛ
    ೺ѲͰ͖ΔɻΞϓϦέʔγϣϯʹܭଌॲཧΛઃ
    ఆ͢Δख͕ؒ͋Δ
    ݸʑͷ৘ใྔ͸গͳ͍͕ऩूɺอଘɺՄࢹԽ͠
    ΍͍͢ɻ
    ࣮؀ڥ΁ͷద༻ੑΛ౿·͑ͯɺʮϝτϦοΫʯʹண໨

    View full-size slide

  7. 7
    ϝτϦοΫϕʔεΞϓϩʔν
    ֤αʔϏεͷܥྻάϥϑ͔Β૬ؔΛൃݟͰ͖Δ͕ɺݪҼՕॴ͕ෆ໌
    ʮ౷ܭతҼՌ୳ࡧʯΛԠ༻ͨ͠Ξϓϩʔν͕௚ۙ਺೥ͰఏҊ͞Ε͍ͯΔ
    Service A
    response time
    Service D
    response time
    Service E
    response time
    Service A
    response time
    Service D
    response time
    Service E
    response time
    Service F
    response time
    Service C
    response time
    ᶃܥྻؒͷҼՌ఻ൖάϥϑͷߏங ᶄҼՌͷܦ࿏ͷਪ࿦
    Ma, M.,et al., AutoMAP: Diagnose Your Microservice-based Web Applications Automatically, WWW2020.
    Qiu, J.,et al., A Causality Mining and Knowledge Graph Based Method of Root Cause Diagnosis for Performance Anomaly in Cloud Applications, Applied Sciences, 2020.
    Lin, J.,et al., Microscope: Pinpoint Performance Issues with Causal Graphs in Micro-Service Environments, ICSOC2018.
    Service A
    response time
    Service F
    response time
    Service C
    response time
    Top-1
    Top-2
    Service B
    response time

    View full-size slide

  8. ɾ਍அʹར༻͢ΔϝτϦοΫͷछྨͷ૊߹ͤ͸ݻఆ ʢ1ʙ7ݸఔ౓ʣ
    ɾྫʣԠ౴஗ԆͷΈɺ{Ԡ౴஗Ԇ, CPUར༻཰, ϝϞϦ࢖༻ྔ,…} ͳͲ
    ɾΑΓݪҼʹ͍ۙϝτϦοΫ͕݁Ռ͔Βআ֎͞ΕΔՄೳੑ͕͋Δ
    8
    ϝτϦοΫϕʔεΞϓϩʔνͷ՝୊
    Ͱ͖ΔݶΓଟ͘ͷϝτϦοΫͷܥྻΛ୳ࡧ͢Δඞཁ͕͋Δ
    TCPͷ࠶ૹΤϥʔ͕ൃੜ͍ͯ͠Δ͕ɺ
    ωοτϫʔΫଳҬͷมԽྔ͕খ͍͞ͳͲ

    View full-size slide

  9. 9
    ੑೳҟৗʹର͢ΔϝτϦοΫͷܥྻͷ࣍ݩ࡟ݮͷఏҊ
    ໨త: ϚΠΫϩαʔϏεʹͯҟৗͷ఻೻ܦ࿏ΛࣗಈͰਪ࿦͢ΔͨΊͷج൫
    ఏҊ: ҟৗͷݕ஌ʹ൓Ԡͯ͠ɺʮҰ࣌తʹʯ਍அʹ༗༻ͳܥྻΛશܥྻ͔
    Βߴ଎ʹநग़͢Δ࣍ݩ࡟ݮख๏ “TSifter” (Time series Sifter)
    ɾᶃਖ਼֬ੑ :਍அʹ༗༻ͳܥྻ͕࡟ݮ͞Ε͍ͯͳ͍
    ɾᶄ࣍ݩ࡟ݮ཰: ແ༻ͳܥྻΛͳΔ΂͘ଟ͘࡟ݮ͍ͨ͠
    ɾᶅߴ଎ੑ : ਝ଎ʹݪҼΛΈ͚͍ͭͨ (ཧ૝͸1෼ఔ౓)
    ܥྻ਺ʢ=࣍ݩ਺ʣ͕૿Ճ͢ΔͱҼՌ఻ൖάϥϑ͕ڊେԽ͢Δ
    3ͭͷཁ݅

    View full-size slide

  10. 10
    ࠷ऴతʹ࣮ݱ͍ͨ͠ݪҼ਍அγεςϜͷશମ૾
    શܥྻ
    औಘ
    ܥྻͷ
    ࣍ݩ࡟ݮ
    ݪҼ਍அ
    ࣌ܥྻ
    σʔλϕʔε
    ܥྻͷऩू
    ҟৗݕ஌
    ఏҊख๏ͷείʔϓ
    YES
    Service A/
    req_errors
    Service D/
    connections
    Service E/
    ܥྻؒͷҼՌͷܦ࿏


    ᶅ ᶆ ᶇ
    ʢҼՌάϥϑߏஙʣ
    αʔϏε୯ҐͰ
    ࣍ݩ࡟ݮ

    View full-size slide

  11. 2.
    ੑೳҟৗͷݪҼ਍அʹ޲͍ͨ
    ϝτϦοΫͷ࣍ݩ࡟ݮख๏

    View full-size slide

  12. 12
    ఏҊख๏ TSifter ͷཁ݅ͱղܾ
    ᶃਖ਼֬ੑ
    ᶄ࣍ݩ࡟ݮ཰
    ᶅߴ଎ੑ
    ಎ࡯1
    ಎ࡯2
    ҟৗൃੜલޙͰ࣌ܥྻͷ܏޲͕มԽ͠ͳ͍
    ܥྻ͸਍அ࣌ʹෆཁ
    → ࣌ܥྻσʔλͷఆৗੑΛ΋ͭܥྻΛআ֎
    ࣌ܥྻάϥϑͷܗঢ়͕ࣅ͍ͯΔܥྻ܈͸ҟ
    ৗͷ਍அ࣌ʹ৑௕
    → αʔϏε୯ҐͰ࣌ܥྻͷΫϥελϦϯά
    ܥྻ਺nʹରͯ͠ΫϥελϦϯάॲཧ͸ , ...
    → ಎ࡯1ͷআ֎ॲཧ Λઌʹ࣮ߦ͢Δ
    O(kn) O(n2)
    O(n)

    View full-size slide

  13. 13
    TSifter: 2ஈ֊ͷ࣍ݩ࡟ݮख๏
    ɾɾɾ
    ɾɾɾ
    ɾɾɾ
    εςοϓ1
    ఆৗੑΛ΋ͭ
    ܥྻΛআڈ
    ੜͷܥྻ ඇఆৗͳܥྻ ΫϥελԽ͞Εͨܥྻ
    ୅දܥྻ
    ҟৗظؒ
    ΫϥελϦϯάޙʹΫϥελ
    ͷ୅දܥྻΛબ୒
    εςοϓ2
    ྨࣅͷܗঢ়Λ
    ͱΔܥྻΛ
    ΫϥελϦϯά
    ҟৗൃੜલn෼ͷ
    ݻఆ௕ͷ΢Οϯυ΢෯

    View full-size slide

  14. 14
    ɾఆৗੑ: σʔλͷਫ४΍͹Β͖ͭɺࣗݾ૬ؔͷؔ܎͕࣌఺ʹΑΒͣҰఆ
    ɾ࣌ܥྻσʔλͷఆৗੑݕఆʹ޿͘ར༻͞ΕΔADFݕఆΛར༻
    ɾશͯͷܥྻΛ1ͭͣͭݕఆ͠ɺఆৗੑΛ΋ͭܥྻΛআڈ
    εςοϓ1: ݸʑͷܥྻͷఆৗੑʹண໨
    ࢒ཹ͢Δඇఆৗͳܥྻͷྫ

    View full-size slide

  15. 15
    εςοϓ2: ܥྻؒͷܗঢ়ྨࣅੑʹண໨
    αʔϏε಺ͷܥྻ܈
    ܗঢ়ͷྨࣅੑΛද͢ڑ཭ई౓ shape-based distance (SBD) Λ࠾༻
    ʢ࣌ؒ࣠ํ޲ʹγϑτɺॎ࣠ʹ৳ॖ͍ͯͯ͠΋ྨࣅͱΈͳ͢ʣ
    Paparrizos, J. and Gravano, L., k-Shape: Efficient and Accurate Clustering of Time Series,(SIGMOD2015)
    ߴ଎ԽͷͨΊɺ1ճͷॲཧͰΫϥελ਺ΛܾఆՄೳͳ֊૚తΫϥελϦϯάΛ࠾༻
    ʢ ͕ͩɺ1αʔϏε͋ͨΓͷܥྻ਺͕খ͍ͨ͞Ί໰୊ʹͳΒͳ͍ʣ
    O(n2)
    αʔϏεͷ୅දܥྻ܈
    Ϋϥελ
    ୅දܥྻͷબ୒
    ଞͷܥྻͱͷڑ཭ͷ૯࿨͕࠷খͷܥྻ

    View full-size slide

  16. 3.
    ࣮ݧͱධՁ

    View full-size slide

  17. 17
    ࣮ݧ؀ڥ
    ੍ޚαʔό
    Locust
    Kubernetes
    CPUෛՙ஫ೖ ωοτϫʔΫ஗Ԇ஫ೖ
    ϚΠΫϩαʔϏεΫϥελ
    Front-End
    Catalogue Orders
    Payment
    Shipping
    User
    Carts
    ղੳαʔό
    Prometheus
    ֎෦ෛՙͷ
    ੜ੒
    ܥྻऔಘϞδϡʔϧ
    stress-ng tc
    ղੳϞδϡʔϧ
    ܥྻऩूִؒ: 5ඵ
    ܥྻͷ΢Οϯυ΢෯: 30෼
    Intel Xeon 3.10GHz, 8core,32GB
    ܥྻͷऩूɾอଘ
    Sock Shop

    View full-size slide

  18. 18
    ϕʔεϥΠϯख๏: Sieve
    ɾεςοϓ1: ෼ࢄ஋ͷখ͍͞ϝτϦοΫΛऔΓআ͘
    ɾεςοϓ2: k-ShapeʹΑΔΫϥελϦϯά
    Thalheim, J., et al., Sieve: Actionable Insights from Monitored Metrics in Distributed Systems, (Middleware 2017)
    ߃ৗతʹར༻ՄೳͳγεςϜͷಛ௃Λநग़͢Δ͜ͱ͕໨తͰ͋Γɺຊ
    ݚڀͱ͸໨త͕ҟͳΔ͕ɺҟͳΔ໨తʹ΋Ԡ༻Ͱ͖ΔՄೳੑ͕͋Δ
    ࣌ܥྻσʔλͷ࣍ݩ࡟ݮख๏
    Paparrizos, J. and Gravano, L., k-Shape: Efficient and Accurate Clustering of Time Series,(SIGMOD2015)

    View full-size slide

  19. 19
    ᶃਖ਼֬ੑ: ҟৗ͝ͱͷݪҼͱͳΔܥྻͷਖ਼ޡ
    TSifter͸શͯͷέʔεʹରͯ͠ਖ਼͘͠ݪҼͱͳΔܥྻΛநग़
    ϕʔεϥΠϯख๏͸shippingαʔϏεͷCPUաෛՙͷέʔεͷΈෆਖ਼ղ

    View full-size slide

  20. 20
    ᶄ࣍ݩ࡟ݮ཰ͷධՁ: ҟৗ4έʔε
    ɾ͍ͣΕͷέʔεʹ͓͍ͯ΋ɺ91%Ҏ্ͷ࣍ݩ࡟ݮ཰Ͱ͋Γɺ1/10Ҏ
    ԼʹߜΓࠐΊ͍ͯΔ
    ɾϕʔεϥΠϯख๏ͷ΄͏͕࣍ݩ࡟ݮ཰͸Θ͔ͣʹߴ͍
    ɾTSifter͸εςοϓ1ͰΑΓଟ͘ͷϝτϦοΫΛ࡟ݮͰ͖͍ͯΔ

    View full-size slide

  21. 21
    ᶅߴ଎ੑͷධՁ: ֤ॲཧεςοϓͷ࣮ߦ࣌ؒ
    ɾCPUίΞ਺4ɺϝτϦοΫ਺100kͷ؀ڥ
    ɾTSifter͸ϕʔεϥΠϯʹରͯ͠ɺ311ഒߴ଎ͱͳͬͨ
    ɾʢޙड़ͷ௥Ճ࣮ݧͰ͸ɺ࠷௿Ͱ΋270ഒߴ଎ʣ
    εςοϓ1 (sec)
    ࣄલআڈ
    εςοϓ2 (sec)
    ΫϥελϦϯά
    ߹ܭ࣮ߦ࣌ؒ (sec)
    TSifter 54.41 8.68 63.09
    ϕʔεϥΠϯ 32.33 19590.83 19623.16

    View full-size slide

  22. 22
    ɾ྆ख๏ͱ΋ʹɺCPUίΞ਺·ͨ͸ܥྻ਺ʹରͯ͠ɺઢܗʹεέʔϧ
    ᶅߴ଎ੑͷධՁ: εέʔϥϏϦςΟ
    TSifter ϕʔεϥΠϯ
    0
    20
    40
    60
    20000 40000 60000 80000 100000
    Execution time (sec)
    Number of metrics
    Clustering
    1.21
    2.43
    3.81
    5.72
    8.68
    Filtering
    10.24
    20.28
    31.05
    42.14
    54.41
    Total
    11.45
    22.71
    34.86
    47.86
    63.09
    0
    5000
    10000
    15000
    20000
    20000 40000 60000 80000 100000
    Execution time (sec)
    Number of metrics
    Clustering
    3908.10
    7773.00
    11710.26
    15670.81
    19590.83
    Filtering
    2.88 7.63 13.54 22.91 32.33
    Total
    3910.98
    7780.63
    11723.80
    15693.72
    19623.16
    0
    200
    400
    600
    800
    1000
    1200
    1400
    1 2 3 4
    Execution time (sec)
    Number of CPU cores
    Clustering
    1224.87
    613.31
    416.55
    317.65
    Filtering
    0.17 0.17 0.17 0.17
    Total
    1225.04
    613.48
    416.72
    317.82
    0
    1
    2
    3
    4
    1 2 3 4
    Execution time (sec)
    Number of CPU cores
    Clustering
    0.37
    0.21
    0.20
    0.15
    Filtering
    3.57
    1.81
    1.26
    0.99
    Total
    3.93
    2.02
    1.46
    1.14
    TSifter ϕʔεϥΠϯ

    View full-size slide

  23. 23
    ࣮ߦ࣌ؒ͸1෼Ҏ಺͕ཧ૝Ͱ͋ΓɺϕʔεϥΠϯख๏ͷ࣮ߦ
    ࣌ؒ͸1225ඵʢ20෼ʣͰ͋Γɺݱ৔Ͱͷཁ݅Λຬͨͤͳ͍
    ֤ཁ݅ʹର͢ΔධՁͷ·ͱΊ
    ᶃਖ਼֬ੑ
    ᶄ࣍ݩ
    ࡟ݮ཰
    ᶅߴ଎ੑ
    ࣮ݧͰ͸ɺαʔϏεͷछྨ΍ނোέʔε͕ݶఆతͳͨΊɺ
    ௥ՃͷධՁ͕ඞཁ
    ࣍ݩ࡟ݮ཰͸ϕʔεϥΠϯख๏͕Θ͔ͣʹ্ճΔ
    ࠷ऴతʹཁٻ͞ΕΔ࣍ݩ࡟ݮ཰ͷఔ౓͸ࠓޙͷ՝୊
    CPUίΞ਺ͱܥྻ਺͕มԽͯ͠΋ɺ྆ख๏ͷ࣮ߦ࣌ؒൺ͸ಉ

    View full-size slide

  24. 24
    ͳͥϕʔεϥΠϯख๏ʹରͯ͠ߴ଎ͳͷ͔ʁ
    ϕʔεϥΠϯ TSifter
    ࠷దͳΫϥελ਺Λܾఆ͢ΔͨΊʹ
    ܁Γฦ࣮͠ߦ
    ΫϥελϦϯά࣮ߦճ਺͸310ճ
    ֊૚తΫϥελϦϯά
    ΫϥελϦϯά
    ࣮ߦճ਺͸7ճ
    ڑ཭ͷᮢ஋Λઃఆͯ͠
    Ϋϥελ਺Λܾఆ

    View full-size slide

  25. 4.
    ·ͱΊͱࠓޙͷల๬

    View full-size slide

  26. 26
    ɾҟৗͷݕ஌ʹ൓Ԡͯ͠ɺେྔͷϝτϦοΫ͔ΒʮҰ࣌తʹʯ਍அʹ༗༻ͳ
    ϝτϦοΫΛߴ଎ʹநग़͢ΔͨΊͷ࣍ݩ࡟ݮख๏ΛఏҊ
    ɾ࣮ݧͷൣғ಺Ͱ͸ɺϕʔεϥΠϯʹରͯ͠ɺ࠷௿Ͱ΋270ഒͷߴ଎ԽΛୡ੒
    ɾਖ਼֬ੑɺ࣍ݩ࡟ݮ཰ɺεέʔϥϏϦςΟͰ͸ಉ౳ఔ౓
    ɾ10 ສϝτϦοΫʹରͯ͠1෼ఔ౓ͷ࣌ؒͰ࣮ߦՄೳ
    ·ͱΊͱࠓޙͷల๬
    ɾࠓޙͷల๬
    ɾఏҊͷྑ͕͞ΑΓ໌֬ͱͳΔධՁͷ௥ՃʢΑΓదͨ͠ϕʔεϥΠϯͷબ୒
    ͳͲʣ
    ɾTSifterΛ૊ΈࠐΜͩݪҼ਍அγεςϜͷ࣮ݱ

    View full-size slide

  27. 0.
    ิ଍εϥΠυ

    View full-size slide

  28. 28
    TSifterͷ੍໿
    ɾ෼ੳظ͕ؒݻఆ஋Ͱ͋ΔͨΊɺ෼ੳظؒ֎ͷมಈΛߟྀͰ͖ͳ͍
    ࣌ؒ࣠

    View full-size slide