Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TSifter: マイクロサービスにおける性能異常の 迅速な診断に向いた時系列データの次元削減手法

TSifter: マイクロサービスにおける性能異常の 迅速な診断に向いた時系列データの次元削減手法

Yuuki Tsubouchi (yuuk1)

November 24, 2020
Tweet

More Decks by Yuuki Tsubouchi (yuuk1)

Other Decks in Research

Transcript

  1. 8 1. ϚΠΫϩαʔϏεؒΛԣஅ͢ΔܥྻؒͷҼՌͷ఻ൖϞσϧΛߏங※ 2. ϚΠΫϩαʔϏε୯ҐͷҼՌͷܦ࿏Λਪ࿦͢Δ ϝτϦοΫϕʔεΞϓϩʔν ҟৗͷݕ஌ޙʹɺϝτϦοΫͷܥྻάϥϑΛ໨ࢹ͢Δͷ͸࣌ؒΛཁ͢Δ ख๏ ※ ௽ాതจ,

    άϥϑΟΧϧϞσϧʹجͮ͘ҼՌ୳ࡧख๏ͷௐࠪ, 2020. https://blog.tsurubee.tech/entry/2020/10/08/085158 Service A/ req_errors Service D/ connections Service E/ disk IOPS ҼՌͷܦ࿏
  2. 10 ੑೳҟৗʹର͢ΔϝτϦοΫͷܥྻͷ࣍ݩ࡟ݮͷఏҊ ໨త: ϚΠΫϩαʔϏεʹ͓͍ͯɺҟৗͷ఻೻ܦ࿏ΛࣗಈͰਪ࿦͢ΔͨΊ ͷج൫Λఏڙ͢Δ ఏҊ: ҟৗͷݕ஌ʹ൓Ԡͯ͠ɺʮҰ࣌తʹʯ਍அʹ༗༻ͳܥྻΛશܥྻ͔ Βߴ଎ʹநग़͢Δख๏ “TSifter” (Time

    series Sifter) ɾᶃਖ਼֬ੑ :਍அʹ༗༻ͳܥྻ͕࡟ݮ͞Ε͍ͯͳ͍ ɾᶄ࣍ݩ࡟ݮ཰: ແ༻ͳܥྻΛͳΔ΂͘ଟ͘࡟ݮ͍ͨ͠ ɾᶅߴ଎ੑ : ਝ଎ʹݪҼΛΈ͚͍ͭͨ (ཧ૝͸1෼ఔ౓)
  3. 11 ࠷ऴతʹ࣮ݱ͍ͨ͠ݪҼ਍அγεςϜͷશମ૾ શܥྻ औಘ ܥྻͷ ࣍ݩ࡟ݮ ݪҼ਍அ ࣌ܥྻ σʔλϕʔε σʔλऩू

    ҟৗݕ஌ ఏҊख๏ͷείʔϓ YES Service A/ req_errors Service D/ connections Service E/ ܥྻؒͷҼՌͷܦ࿏ ᶃ ᶄ ᶅ ᶆ ᶇ
  4. 13 ఏҊख๏ TSifter ͷཁ݅ͱղܾ ᶃਖ਼֬ੑ ᶄ࣍ݩ࡟ݮ཰ ᶅߴ଎ੑ ಎ࡯1 ಎ࡯2 ҟৗൃੜલޙͰ࣌ܥྻͷ܏޲͕มԽ͠ͳ͍

    ܥྻ͸਍அ࣌ʹෆཁ → ࣌ܥྻσʔλͷఆৗੑΛ΋ͭܥྻΛআ֎ ࣌ܥྻมԽͷܗঢ়͕ࣅ͍ͯΔܥྻ܈͸ҟৗ ͷ਍அ࣌ʹ৑௕ → ࣌ܥྻͷΫϥελϦϯά ܥྻ਺͕େ͖͍΄ͲΫϥελϦϯάॲཧ͕஗͍ → ಎ࡯1ͷআ֎ॲཧΛઌʹ࣮ߦ͢Δ
  5. 14 TSifter: 2ஈ֊ͷ࣍ݩ࡟ݮख๏ ɾɾɾ ɾɾɾ ɾɾɾ εςοϓ1 ఆৗੑΛ΋ͭ ܥྻΛআڈ ੜͷܥྻ

    ඇఆৗͳܥྻ ΫϥελԽ͞Εͨܥྻ ୅දܥྻ ҟৗظؒ ΫϥελϦϯάޙʹΫϥελ ͷ୅දܥྻΛબ୒ εςοϓ2 ྨࣅͷܗঢ়Λ ͱΔܥྻΛ ΫϥελϦϯά ҟৗൃੜલn෼ͷ ݻఆ௕ͷ΢Οϯυ΢෯
  6. 16 ɾ֤αʔϏε಺Ͱɺάϥϑͷܗঢ়͕ࣅ͍ͯΔܥྻΛΫϥελϦϯά ɾܗঢ়ͷྨࣅੑΛߟྀͨ͠ڑ཭ई౓ͱͯ͠ shape-based distance (SBD) Λ࠾༻ (2ͭͷܥྻΛεϥΠυͤͯ͞૬ؔΛΈΔ) ɾߴ଎ԽͷͨΊʹɺ1ճͷΫϥελϦϯάॲཧͰΫϥελ਺ΛܾఆՄ ೳͳ֊૚తΫϥελϦϯάΛ࠾༻

    ɾ࠷ޙʹɺ֤Ϋϥελͷ୅දͱͳΔܥྻΛҰͭબ୒ ɾଞͷܥྻͱͷڑ཭ͷ૯࿨͕࠷খͷܥྻΛબ୒ εςοϓ2: ܥྻؒͷܗঢ়ྨࣅੑʹண໨ [Paparrizos 15]: Paparrizos, J. and Gravano, L., k-Shape: Efficient and Accurate Clustering of Time Series, The ACM Special Interest Group on Management of Data (SIGMOD), pp. 1855– 1870 2015. [Paparrizos 15]
  7. 19 ࣮ݧઃఆ ɾΞϓϦέʔγϣϯ: Sock Shop ɾςετϕου: GKE্ʹߏங ɾϝτϦοΫऩू: Prometheus ɾෛՙੜ੒:

    Locust ɾނো஫ೖ ɾCPUෛՙ: stress-ng ɾωοτϫʔΫ஗Ԇ: tc ϋʔυ΢ΣΞߏ੒͸༧ߘΛࢀর Sock Shopͷߏ੒ਤ https://microservices-demo.github.io/
  8. 20 ϕʔεϥΠϯख๏: Sieve [Thalheim 17] ɾεςοϓ1: ෼ࢄ஋ͷখ͍͞ϝτϦοΫΛऔΓআ͘ ɾεςοϓ2: ࣌ܥྻΫϥελϦϯάख๏k-ShapeʹΑΓΫϥελϦϯ άͨ͠ͷͪʹ୅දϝτϦοΫΛબग़͢Δ

    [Thalheim 17] Thalheim, J., Rodrigues, A., Akkus, I. E., Bhatotia, P., Chen, R., Viswanath, B., Jiao, L. and Fetzer, C., Sieve: Actionable Insights from Monitored Metrics in Distributed Systems, the ACM/IFIP/USENIX Middleware, pp. 14–27 2017. ߃ৗతʹར༻ՄೳͳγεςϜͷಛ௃Λநग़͢Δ͜ͱ͕໨తͰ͋Γɺຊ ݚڀͱ͸໨త͕ҟͳΔ͕ɺҟͳΔ໨తʹ΋Ԡ༻Ͱ͖ΔՄೳੑ͕͋Δ
  9. 23 ɾTSifterͷ࣮ߦ࣌ؒ͸ϕʔεϥΠϯͷ270ഒҎ্Ͱ͋ͬͨ ɾ͍ͣΕͷख๏΋CPUίΞʹର࣮ͯ͠ߦ࣌ؒ͸εέʔϧͨ͠ ߴ଎ੑͷධՁ: CPUίΞ਺ʹର͢Δ࣮ߦ࣌ؒ 0 200 400 600 800

    1000 1200 1400 1 2 3 4 Execution time (sec) Number of CPU cores Clustering 1224.87 613.31 416.55 317.65 Filtering 0.17 0.17 0.17 0.17 Total 1225.04 613.48 416.72 317.82 0 1 2 3 4 1 2 3 4 Execution time (sec) Number of CPU cores Clustering 0.37 0.21 0.20 0.15 Filtering 3.57 1.81 1.26 0.99 Total 3.93 2.02 1.46 1.14 TSifter ϕʔεϥΠϯ
  10. 24 ɾ͍ͣΕͷख๏΋ϝτϦοΫ਺͕૿େʹରͯ͠ઢܗʹεέʔϧͨ͠ ߴ଎ੑͷධՁ: ϝτϦοΫ਺ʹର͢Δ࣮ߦ࣌ؒ TSifter ϕʔεϥΠϯ 0 20 40 60

    20000 40000 60000 80000 100000 Execution time (sec) Number of metrics Clustering 1.21 2.43 3.81 5.72 8.68 Filtering 10.24 20.28 31.05 42.14 54.41 Total 11.45 22.71 34.86 47.86 63.09 0 5000 10000 15000 20000 20000 40000 60000 80000 100000 Execution time (sec) Number of metrics Clustering 3908.10 7773.00 11710.26 15670.81 19590.83 Filtering 2.88 7.63 13.54 22.91 32.33 Total 3910.98 7780.63 11723.80 15693.72 19623.16