Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TSifter: マイクロサービスにおける性能異常の迅速な診断に向いた時系列データの次元削...

TSifter: マイクロサービスにおける性能異常の迅速な診断に向いた時系列データの次元削減手法 / TSifter in proceedings of IOTS2020

第13回情報処理学会インターネットと運用技術シンポジウムhttps://www.iot.ipsj.or.jp/symposium/iots2020-program/

Yuuki Tsubouchi (yuuk1)

December 03, 2020
Tweet

More Decks by Yuuki Tsubouchi (yuuk1)

Other Decks in Research

Transcript

  1. 7 ϝτϦοΫϕʔεΞϓϩʔν ֤αʔϏεͷܥྻάϥϑ͔Β૬ؔΛൃݟͰ͖Δ͕ɺݪҼՕॴ͕ෆ໌ ʮ౷ܭతҼՌ୳ࡧʯΛԠ༻ͨ͠Ξϓϩʔν͕௚ۙ਺೥ͰఏҊ͞Ε͍ͯΔ Service A response time Service D

    response time Service E response time Service A response time Service D response time Service E response time Service F response time Service C response time ᶃܥྻؒͷҼՌ఻ൖάϥϑͷߏங ᶄҼՌͷܦ࿏ͷਪ࿦ Ma, M.,et al., AutoMAP: Diagnose Your Microservice-based Web Applications Automatically, WWW2020. Qiu, J.,et al., A Causality Mining and Knowledge Graph Based Method of Root Cause Diagnosis for Performance Anomaly in Cloud Applications, Applied Sciences, 2020. Lin, J.,et al., Microscope: Pinpoint Performance Issues with Causal Graphs in Micro-Service Environments, ICSOC2018. Service A response time Service F response time Service C response time Top-1 Top-2 Service B response time
  2. 9 ੑೳҟৗʹର͢ΔϝτϦοΫͷܥྻͷ࣍ݩ࡟ݮͷఏҊ ໨త: ϚΠΫϩαʔϏεʹͯҟৗͷ఻೻ܦ࿏ΛࣗಈͰਪ࿦͢ΔͨΊͷج൫ ఏҊ: ҟৗͷݕ஌ʹ൓Ԡͯ͠ɺʮҰ࣌తʹʯ਍அʹ༗༻ͳܥྻΛશܥྻ͔ Βߴ଎ʹநग़͢Δ࣍ݩ࡟ݮख๏ “TSifter” (Time series

    Sifter) ɾᶃਖ਼֬ੑ :਍அʹ༗༻ͳܥྻ͕࡟ݮ͞Ε͍ͯͳ͍ ɾᶄ࣍ݩ࡟ݮ཰: ແ༻ͳܥྻΛͳΔ΂͘ଟ͘࡟ݮ͍ͨ͠ ɾᶅߴ଎ੑ : ਝ଎ʹݪҼΛΈ͚͍ͭͨ (ཧ૝͸1෼ఔ౓) ܥྻ਺ʢ=࣍ݩ਺ʣ͕૿Ճ͢ΔͱҼՌ఻ൖάϥϑ͕ڊେԽ͢Δ 3ͭͷཁ݅
  3. 10 ࠷ऴతʹ࣮ݱ͍ͨ͠ݪҼ਍அγεςϜͷશମ૾ શܥྻ औಘ ܥྻͷ ࣍ݩ࡟ݮ ݪҼ਍அ ࣌ܥྻ σʔλϕʔε ܥྻͷऩू

    ҟৗݕ஌ ఏҊख๏ͷείʔϓ YES Service A/ req_errors Service D/ connections Service E/ ܥྻؒͷҼՌͷܦ࿏ ᶃ ᶄ ᶅ ᶆ ᶇ ʢҼՌάϥϑߏஙʣ αʔϏε୯ҐͰ ࣍ݩ࡟ݮ
  4. 12 ఏҊख๏ TSifter ͷཁ݅ͱղܾ ᶃਖ਼֬ੑ ᶄ࣍ݩ࡟ݮ཰ ᶅߴ଎ੑ ಎ࡯1 ಎ࡯2 ҟৗൃੜલޙͰ࣌ܥྻͷ܏޲͕มԽ͠ͳ͍

    ܥྻ͸਍அ࣌ʹෆཁ → ࣌ܥྻσʔλͷఆৗੑΛ΋ͭܥྻΛআ֎ ࣌ܥྻάϥϑͷܗঢ়͕ࣅ͍ͯΔܥྻ܈͸ҟ ৗͷ਍அ࣌ʹ৑௕ → αʔϏε୯ҐͰ࣌ܥྻͷΫϥελϦϯά ܥྻ਺nʹରͯ͠ΫϥελϦϯάॲཧ͸ , ... → ಎ࡯1ͷআ֎ॲཧ Λઌʹ࣮ߦ͢Δ O(kn) O(n2) O(n)
  5. 13 TSifter: 2ஈ֊ͷ࣍ݩ࡟ݮख๏ ɾɾɾ ɾɾɾ ɾɾɾ εςοϓ1 ఆৗੑΛ΋ͭ ܥྻΛআڈ ੜͷܥྻ

    ඇఆৗͳܥྻ ΫϥελԽ͞Εͨܥྻ ୅දܥྻ ҟৗظؒ ΫϥελϦϯάޙʹΫϥελ ͷ୅දܥྻΛબ୒ εςοϓ2 ྨࣅͷܗঢ়Λ ͱΔܥྻΛ ΫϥελϦϯά ҟৗൃੜલn෼ͷ ݻఆ௕ͷ΢Οϯυ΢෯
  6. 15 εςοϓ2: ܥྻؒͷܗঢ়ྨࣅੑʹண໨ αʔϏε಺ͷܥྻ܈ ܗঢ়ͷྨࣅੑΛද͢ڑ཭ई౓ shape-based distance (SBD) Λ࠾༻ ʢ࣌ؒ࣠ํ޲ʹγϑτɺॎ࣠ʹ৳ॖ͍ͯͯ͠΋ྨࣅͱΈͳ͢ʣ

    Paparrizos, J. and Gravano, L., k-Shape: Efficient and Accurate Clustering of Time Series,(SIGMOD2015) ߴ଎ԽͷͨΊɺ1ճͷॲཧͰΫϥελ਺ΛܾఆՄೳͳ֊૚తΫϥελϦϯάΛ࠾༻ ʢ ͕ͩɺ1αʔϏε͋ͨΓͷܥྻ਺͕খ͍ͨ͞Ί໰୊ʹͳΒͳ͍ʣ O(n2) αʔϏεͷ୅දܥྻ܈ Ϋϥελ ୅දܥྻͷબ୒ ଞͷܥྻͱͷڑ཭ͷ૯࿨͕࠷খͷܥྻ
  7. 17 ࣮ݧ؀ڥ ੍ޚαʔό Locust Kubernetes CPUෛՙ஫ೖ ωοτϫʔΫ஗Ԇ஫ೖ ϚΠΫϩαʔϏεΫϥελ Front-End Catalogue

    Orders Payment Shipping User Carts ղੳαʔό Prometheus ֎෦ෛՙͷ ੜ੒ ܥྻऔಘϞδϡʔϧ stress-ng tc ղੳϞδϡʔϧ ܥྻऩूִؒ: 5ඵ ܥྻͷ΢Οϯυ΢෯: 30෼ Intel Xeon 3.10GHz, 8core,32GB ܥྻͷऩूɾอଘ Sock Shop
  8. 18 ϕʔεϥΠϯख๏: Sieve ɾεςοϓ1: ෼ࢄ஋ͷখ͍͞ϝτϦοΫΛऔΓআ͘ ɾεςοϓ2: k-ShapeʹΑΔΫϥελϦϯά Thalheim, J., et

    al., Sieve: Actionable Insights from Monitored Metrics in Distributed Systems, (Middleware 2017) ߃ৗతʹར༻ՄೳͳγεςϜͷಛ௃Λநग़͢Δ͜ͱ͕໨తͰ͋Γɺຊ ݚڀͱ͸໨త͕ҟͳΔ͕ɺҟͳΔ໨తʹ΋Ԡ༻Ͱ͖ΔՄೳੑ͕͋Δ ࣌ܥྻσʔλͷ࣍ݩ࡟ݮख๏ Paparrizos, J. and Gravano, L., k-Shape: Efficient and Accurate Clustering of Time Series,(SIGMOD2015)
  9. 22 ɾ྆ख๏ͱ΋ʹɺCPUίΞ਺·ͨ͸ܥྻ਺ʹରͯ͠ɺઢܗʹεέʔϧ ᶅߴ଎ੑͷධՁ: εέʔϥϏϦςΟ TSifter ϕʔεϥΠϯ 0 20 40 60

    20000 40000 60000 80000 100000 Execution time (sec) Number of metrics Clustering 1.21 2.43 3.81 5.72 8.68 Filtering 10.24 20.28 31.05 42.14 54.41 Total 11.45 22.71 34.86 47.86 63.09 0 5000 10000 15000 20000 20000 40000 60000 80000 100000 Execution time (sec) Number of metrics Clustering 3908.10 7773.00 11710.26 15670.81 19590.83 Filtering 2.88 7.63 13.54 22.91 32.33 Total 3910.98 7780.63 11723.80 15693.72 19623.16 0 200 400 600 800 1000 1200 1400 1 2 3 4 Execution time (sec) Number of CPU cores Clustering 1224.87 613.31 416.55 317.65 Filtering 0.17 0.17 0.17 0.17 Total 1225.04 613.48 416.72 317.82 0 1 2 3 4 1 2 3 4 Execution time (sec) Number of CPU cores Clustering 0.37 0.21 0.20 0.15 Filtering 3.57 1.81 1.26 0.99 Total 3.93 2.02 1.46 1.14 TSifter ϕʔεϥΠϯ