Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TSifter: マイクロサービスにおける性能異常の 迅速な診断に向いた時系列データの次元削減手法

TSifter: マイクロサービスにおける性能異常の 迅速な診断に向いた時系列データの次元削減手法

Yuuki Tsubouchi (yuuk1)

November 24, 2020
Tweet

More Decks by Yuuki Tsubouchi (yuuk1)

Other Decks in Research

Transcript

  1. TSifter: ϚΠΫϩαʔϏεʹ͓͚Δੑೳҟৗͷ
    ਝ଎ͳ਍அʹ޲͍ͨ࣌ܥྻσʔλͷ࣍ݩ࡟ݮख๏
    ௶಺ ༎थ(@yuuk1t), ௽ా തจ(@tsurubee3), ݹ઒ խେ(@yoyogidesaiz)
    @͸ͯͳ 2020/11/24

    View Slide

  2. 2
    SLIɾSLOΛத৺ʹਾ͑ͨγεςϜҟৗ΁ͷΞϓϩʔν
    ɾ༧ଌɹաڈʹSLI͕௿Լͨ͠௚લͷৼΔ෣͍ͱྨࣅ͢ΔৼΔ෣͍Λൃݟ
    ɾҟৗՕॴͷಛఆɹSLOҧ൓࣌ʹҟৗՕॴΛ୳ࡧ
    ɾݪҼڀ໌ɹʢ޻ࣄதʣ
    ɾճ෮ɹSLIΛλʔήοτɺใुͱ͢ΔϑΟʔυόοΫ੍ޚɺڧԽֶश
    ɾSLIͷ୅ସɹࠓ͋ΔϝτϦοΫΛ૊Έ߹ΘͤͯSLIͷ୅ସࢦඪΛ࡞੒
    ༧ଌ ҟৗՕॴͷಛఆ ݪҼڀ໌ ճ෮
    ݕ஌
    SLOʹجͮ͘Ξϥʔτ
    ௚ۙͷڵຯର৅

    View Slide

  3. 3
    1. ݚڀͷഎܠͱ໨త
    2. ҟৗͷݪҼ਍அʹ޲͍ͨϝτϦοΫͷ࣍ݩ࡟ݮख๏
    3. ࣮ݧͱධՁ
    4. ·ͱΊͱࠓޙͷల๬
    ໨࣍

    View Slide

  4. 1.
    ݚڀͷഎܠͱ໨త

    View Slide

  5. 5
    ϚΠΫϩαʔϏεߏ੒ͷීٴ
    ϞϊϦγοΫ ϚΠΫϩαʔϏε

    View Slide

  6. 6
    ؍ଌσʔλྔ
    ͷ૿େ
    ϚΠΫϩαʔϏεʹΑΔ෼ࢄԽʹ·ͭΘΔ໰୊ҙࣝ
    ґଘؔ܎ͷෳࡶੑ
    ιϑτ΢ΣΞͷ
    ಈతͳมߋ
    γεςϜͷೝ஌ෛՙ͕ߴ·Δ
    ҟৗͷݪҼΛ਍அ͢ΔͨΊͷ࣌ؒΛཁ͢ΔΑ͏ʹͳΔ

    View Slide

  7. 7
    ੑೳҟৗΛ਍அ͢ΔͨΊͷطଘͷΞϓϩʔν
    ϝτϦοΫ
    ςΩετϩά
    ࣮ߦτϨʔε
    ๛෋ͳ৘ใΛ΋͕ͭϩάʹग़ྗ͞Εͳ͍΋ͷ΋
    ͋Δ
    ॲཧܦ࿏ͷล୯Ґͷεϧʔϓοτ΍࣮ߦ࣌ؒΛ
    ೺ѲͰ͖Δ͕ɺΞϓϦέʔγϣϯʹܭଌॲཧΛ
    ઃఆ͢Δख͕ؒ͋Δ
    ݸʑͷ৘ใྔ͸গͳ͍͕ऩूɺอଘɺՄࢹԽ͠
    ΍͍͢ɻ
    ࣮؀ڥ΁ͷద༻ੑΛ౿·͑ͯɺʮϝτϦοΫʯʹண໨

    View Slide

  8. 8
    1. ϚΠΫϩαʔϏεؒΛԣஅ͢ΔܥྻؒͷҼՌͷ఻ൖϞσϧΛߏங※
    2. ϚΠΫϩαʔϏε୯ҐͷҼՌͷܦ࿏Λਪ࿦͢Δ
    ϝτϦοΫϕʔεΞϓϩʔν
    ҟৗͷݕ஌ޙʹɺϝτϦοΫͷܥྻάϥϑΛ໨ࢹ͢Δͷ͸࣌ؒΛཁ͢Δ
    ख๏
    ※ ௽ాതจ, άϥϑΟΧϧϞσϧʹجͮ͘ҼՌ୳ࡧख๏ͷௐࠪ, 2020. https://blog.tsurubee.tech/entry/2020/10/08/085158
    Service A/
    req_errors
    Service D/
    connections
    Service E/
    disk IOPS
    ҼՌͷܦ࿏

    View Slide

  9. ɾ਍அʹར༻͢Δݻఆ਺ͷϝτϦοΫΛࢦఆ͠ͳ͚Ε͹ͳΒͳ͍
    ɾྫʣԠ౴஗ԆͷΈɺ{Ԡ౴஗Ԇ,CPUར༻཰,ϝϞϦ࢖༻ྔ,…} ͳͲ
    ɾΑΓݪҼʹ͍ۙϝτϦοΫ͕݁Ռ͔Βআ֎͞ΕΔՄೳੑ͕͋Δ
    9
    ϝτϦοΫϕʔεΞϓϩʔνͷ՝୊
    Ͱ͖ΔݶΓଟ͘ͷϝτϦοΫΛ෼ੳ͢Δඞཁ͕͋Δ

    View Slide

  10. 10
    ੑೳҟৗʹର͢ΔϝτϦοΫͷܥྻͷ࣍ݩ࡟ݮͷఏҊ
    ໨త: ϚΠΫϩαʔϏεʹ͓͍ͯɺҟৗͷ఻೻ܦ࿏ΛࣗಈͰਪ࿦͢ΔͨΊ
    ͷج൫Λఏڙ͢Δ
    ఏҊ: ҟৗͷݕ஌ʹ൓Ԡͯ͠ɺʮҰ࣌తʹʯ਍அʹ༗༻ͳܥྻΛશܥྻ͔
    Βߴ଎ʹநग़͢Δख๏ “TSifter” (Time series Sifter)
    ɾᶃਖ਼֬ੑ :਍அʹ༗༻ͳܥྻ͕࡟ݮ͞Ε͍ͯͳ͍
    ɾᶄ࣍ݩ࡟ݮ཰: ແ༻ͳܥྻΛͳΔ΂͘ଟ͘࡟ݮ͍ͨ͠
    ɾᶅߴ଎ੑ : ਝ଎ʹݪҼΛΈ͚͍ͭͨ (ཧ૝͸1෼ఔ౓)

    View Slide

  11. 11
    ࠷ऴతʹ࣮ݱ͍ͨ͠ݪҼ਍அγεςϜͷશମ૾
    શܥྻ
    औಘ
    ܥྻͷ
    ࣍ݩ࡟ݮ
    ݪҼ਍அ
    ࣌ܥྻ
    σʔλϕʔε
    σʔλऩू
    ҟৗݕ஌
    ఏҊख๏ͷείʔϓ
    YES
    Service A/
    req_errors
    Service D/
    connections
    Service E/
    ܥྻؒͷҼՌͷܦ࿏


    ᶅ ᶆ ᶇ

    View Slide

  12. 2.
    ੑೳҟৗͷݪҼ਍அʹ޲͍ͨ
    ϝτϦοΫͷ࣍ݩ࡟ݮख๏

    View Slide

  13. 13
    ఏҊख๏ TSifter ͷཁ݅ͱղܾ
    ᶃਖ਼֬ੑ
    ᶄ࣍ݩ࡟ݮ཰
    ᶅߴ଎ੑ
    ಎ࡯1
    ಎ࡯2
    ҟৗൃੜલޙͰ࣌ܥྻͷ܏޲͕มԽ͠ͳ͍
    ܥྻ͸਍அ࣌ʹෆཁ
    → ࣌ܥྻσʔλͷఆৗੑΛ΋ͭܥྻΛআ֎
    ࣌ܥྻมԽͷܗঢ়͕ࣅ͍ͯΔܥྻ܈͸ҟৗ
    ͷ਍அ࣌ʹ৑௕
    → ࣌ܥྻͷΫϥελϦϯά
    ܥྻ਺͕େ͖͍΄ͲΫϥελϦϯάॲཧ͕஗͍
    → ಎ࡯1ͷআ֎ॲཧΛઌʹ࣮ߦ͢Δ

    View Slide

  14. 14
    TSifter: 2ஈ֊ͷ࣍ݩ࡟ݮख๏
    ɾɾɾ
    ɾɾɾ
    ɾɾɾ
    εςοϓ1
    ఆৗੑΛ΋ͭ
    ܥྻΛআڈ
    ੜͷܥྻ ඇఆৗͳܥྻ ΫϥελԽ͞Εͨܥྻ
    ୅දܥྻ
    ҟৗظؒ
    ΫϥελϦϯάޙʹΫϥελ
    ͷ୅දܥྻΛબ୒
    εςοϓ2
    ྨࣅͷܗঢ়Λ
    ͱΔܥྻΛ
    ΫϥελϦϯά
    ҟৗൃੜલn෼ͷ
    ݻఆ௕ͷ΢Οϯυ΢෯

    View Slide

  15. 15
    ɾఆৗੑ: σʔλͷฏۉ͓Αͼ෼ࢄ͕࣌ؒʹΑΒͣҰఆɼ͔ͭࣗݾڞ
    ෼ࢄ͕࣌ؒࠩͷΈʹґଘ͢Δੑ࣭
    ɾ࣌ܥྻσʔλͷఆৗੑݕఆʹ޿͘ར༻͞ΕΔADFݕఆΛར༻͢Δ
    ɾશͯͷܥྻΛ1ͭͣͭݕఆ͠ɺఆৗੑΛ΋ͭܥྻΛআڈ͢Δ
    εςοϓ1: ݸʑͷϝτϦοΫͷఆৗੑʹண໨
    ఆৗੑΛ΋ͭܥྻͷྫ

    View Slide

  16. 16
    ɾ֤αʔϏε಺Ͱɺάϥϑͷܗঢ়͕ࣅ͍ͯΔܥྻΛΫϥελϦϯά
    ɾܗঢ়ͷྨࣅੑΛߟྀͨ͠ڑ཭ई౓ͱͯ͠ shape-based distance
    (SBD) Λ࠾༻ (2ͭͷܥྻΛεϥΠυͤͯ͞૬ؔΛΈΔ)
    ɾߴ଎ԽͷͨΊʹɺ1ճͷΫϥελϦϯάॲཧͰΫϥελ਺ΛܾఆՄ
    ೳͳ֊૚తΫϥελϦϯάΛ࠾༻
    ɾ࠷ޙʹɺ֤Ϋϥελͷ୅දͱͳΔܥྻΛҰͭબ୒
    ɾଞͷܥྻͱͷڑ཭ͷ૯࿨͕࠷খͷܥྻΛબ୒
    εςοϓ2: ܥྻؒͷܗঢ়ྨࣅੑʹண໨
    [Paparrizos 15]: Paparrizos, J. and Gravano, L., k-Shape: Efficient and Accurate Clustering of Time Series, The ACM Special Interest Group on Management of Data (SIGMOD), pp. 1855–
    1870 2015.
    [Paparrizos 15]

    View Slide

  17. 17
    TSifterͷ੍໿
    ɾ෼ੳظ͕ؒݻఆ஋Ͱ͋ΔͨΊɺ෼ੳظؒ֎ͷมಈΛߟྀͰ͖ͳ͍
    ɾྫ͑͹ɺ༧Ί෼ੳظؒΛ30෼ͱ͢Δͱɺҟৗͷݕ஌࣌ࠁͷ40෼લ
    ʹ܏޲͕มԽ͠͸͡Ίͯɺ30෼લ͔Β0෼લͷؒͰ͸ఆৗੑΛ΋ͭ
    ৔߹ɺ͜ͷܥྻ͸আڈ͞ΕΔ

    View Slide

  18. 3.
    ࣮ݧͱධՁ

    View Slide

  19. 19
    ࣮ݧઃఆ
    ɾΞϓϦέʔγϣϯ: Sock Shop
    ɾςετϕου: GKE্ʹߏங
    ɾϝτϦοΫऩू: Prometheus
    ɾෛՙੜ੒: Locust
    ɾނো஫ೖ
    ɾCPUෛՙ: stress-ng
    ɾωοτϫʔΫ஗Ԇ: tc
    ϋʔυ΢ΣΞߏ੒͸༧ߘΛࢀর Sock Shopͷߏ੒ਤ
    https://microservices-demo.github.io/

    View Slide

  20. 20
    ϕʔεϥΠϯख๏: Sieve [Thalheim 17]
    ɾεςοϓ1: ෼ࢄ஋ͷখ͍͞ϝτϦοΫΛऔΓআ͘
    ɾεςοϓ2: ࣌ܥྻΫϥελϦϯάख๏k-ShapeʹΑΓΫϥελϦϯ
    άͨ͠ͷͪʹ୅දϝτϦοΫΛબग़͢Δ
    [Thalheim 17] Thalheim, J., Rodrigues, A., Akkus, I. E., Bhatotia, P., Chen, R., Viswanath, B., Jiao, L. and Fetzer, C., Sieve: Actionable
    Insights from Monitored Metrics in Distributed Systems, the ACM/IFIP/USENIX Middleware, pp. 14–27 2017.
    ߃ৗతʹར༻ՄೳͳγεςϜͷಛ௃Λநग़͢Δ͜ͱ͕໨తͰ͋Γɺຊ
    ݚڀͱ͸໨త͕ҟͳΔ͕ɺҟͳΔ໨తʹ΋Ԡ༻Ͱ͖ΔՄೳੑ͕͋Δ

    View Slide

  21. 21
    ਖ਼֬ੑ: ނোέʔε͝ͱͷݪҼͱͳΔϝτϦοΫͷਖ਼ޡ
    TSifter͸શͯͷέʔεʹରͯ͠ਖ਼͘͠ݪҼͱͳΔϝτϦοΫΛநग़ͨ͠
    ϕʔεϥΠϯख๏͸shippingαʔϏεͷCPUաෛՙͷέʔεͷΈෆਖ਼ղͱͳͬͨ

    View Slide

  22. 22
    ࣍ݩ࡟ݮ཰ͷධՁ
    ɾ͍ͣΕͷέʔεʹ͓͍ͯ΋ɺ91%Ҏ্ͷ࣍ݩ࡟ݮ཰Ͱ͋Γɺ1/10Ҏ
    ԼʹߜΓࠐΊ͍ͯΔ
    ɾϕʔεϥΠϯख๏ͷ΄͏͕࣍ݩ࡟ݮ཰͸Θ͔ͣʹߴ͍
    ɾTSifter͸εςοϓ1ͰΑΓଟ͘ͷϝτϦοΫΛ࡟ݮͰ͖͍ͯΔ

    View Slide

  23. 23
    ɾTSifterͷ࣮ߦ࣌ؒ͸ϕʔεϥΠϯͷ270ഒҎ্Ͱ͋ͬͨ
    ɾ͍ͣΕͷख๏΋CPUίΞʹର࣮ͯ͠ߦ࣌ؒ͸εέʔϧͨ͠
    ߴ଎ੑͷධՁ: CPUίΞ਺ʹର͢Δ࣮ߦ࣌ؒ
    0
    200
    400
    600
    800
    1000
    1200
    1400
    1 2 3 4
    Execution time (sec)
    Number of CPU cores
    Clustering
    1224.87
    613.31
    416.55
    317.65
    Filtering
    0.17 0.17 0.17 0.17
    Total
    1225.04
    613.48
    416.72
    317.82
    0
    1
    2
    3
    4
    1 2 3 4
    Execution time (sec)
    Number of CPU cores
    Clustering
    0.37
    0.21
    0.20
    0.15
    Filtering
    3.57
    1.81
    1.26
    0.99
    Total
    3.93
    2.02
    1.46
    1.14
    TSifter ϕʔεϥΠϯ

    View Slide

  24. 24
    ɾ͍ͣΕͷख๏΋ϝτϦοΫ਺͕૿େʹରͯ͠ઢܗʹεέʔϧͨ͠
    ߴ଎ੑͷධՁ: ϝτϦοΫ਺ʹର͢Δ࣮ߦ࣌ؒ
    TSifter ϕʔεϥΠϯ
    0
    20
    40
    60
    20000 40000 60000 80000 100000
    Execution time (sec)
    Number of metrics
    Clustering
    1.21
    2.43
    3.81
    5.72
    8.68
    Filtering
    10.24
    20.28
    31.05
    42.14
    54.41
    Total
    11.45
    22.71
    34.86
    47.86
    63.09
    0
    5000
    10000
    15000
    20000
    20000 40000 60000 80000 100000
    Execution time (sec)
    Number of metrics
    Clustering
    3908.10
    7773.00
    11710.26
    15670.81
    19590.83
    Filtering
    2.88 7.63 13.54 22.91 32.33
    Total
    3910.98
    7780.63
    11723.80
    15693.72
    19623.16

    View Slide

  25. 25
    ࣮ߦ࣌ؒ͸1෼Ҏ಺͕ཧ૝Ͱ͋ΓɺϕʔεϥΠϯख๏ͷ࣮ߦ
    ࣌ؒ͸1225ඵʢ20෼ʣͰ͋Γɺݱ৔Ͱͷཁ݅Λຬͨͤͳ͍
    ֤ཁ݅ʹର͢ΔධՁͷ·ͱΊ
    ᶃਖ਼֬ੑ
    ᶄ࣍ݩ
    ࡟ݮ཰
    ᶅߴ଎ੑ
    ࣮ݧͰ͸ɺαʔϏεͷछྨ΍ނোέʔε͕ݶఆతͳͨΊɺ
    ௥ՃͷධՁ͕ඞཁ
    ࣍ݩ࡟ݮ཰͸ϕʔεϥΠϯख๏͕Θ͔ͣʹ্ճΔɻ
    ݪҼ਍அγεςϜͱͯ͠Ͳͷఔ౓ͷ࣍ݩ࡟ݮ཰͕ཁٻ͞ΕΔ
    ͔͸ࠓޙͷ՝୊

    View Slide

  26. 26
    ɾϕʔεϥΠϯख๏͸ɺ࠷దͳΫϥελ਺Λܾఆ͢ΔͨΊʹɺ Ϋϥε
    λ਺ΛมԽͤ͞ͳ͕Βɺ܁Γฦ͠k-ShapeΞϧΰϦζϜΛ࣮ߦ͢Δ
    ɾΫϥελϦϯάͷ࣮ߦճ਺͕ෳ਺ճ
    ɾTSifterͷ֊૚తΫϥελϦϯά͸ɺΫϥελϦϯά࣮ߦޙʹڑ཭ͷ
    ᮢ஋Λ༻͍ͯΫϥελ਺ΛܾఆͰ͖Δ
    ɾΫϥελϦϯάͷ࣮ߦճ਺͸1
    ϕʔεϥΠϯʹରͯ͠ߴ଎ͱͳΔཧ༝

    View Slide

  27. 4.
    ·ͱΊͱࠓޙͷల๬

    View Slide

  28. 28
    ɾେྔͷϝτϦοΫ͔Β਍அʹ༗༻ͳϝτϦοΫΛநग़͢ΔͨΊͷ࣍
    ݩ࡟ݮख๏ΛఏҊͨ͠
    ɾ࣮ݧͷൣғ಺Ͱ͸ɺϕʔεϥΠϯख๏ʹରͯ͠ɺ࠷௿Ͱ΋270ഒͷ
    ߴ଎ͱͳͬͨ
    ɾਖ਼֬ੑɺ࣍ݩ࡟ݮ཰ɺεέʔϥϏϦςΟͰ͸ಉ౳ఔ౓Ͱ͋Δ
    ɾ10 ສϝτϦοΫʹରͯ͠1෼ఔ౓ͷ࣌ؒͰ࣮ߦՄೳͰ͋Δ
    ·ͱΊͱࠓޙͷల๬

    View Slide

  29. 29
    ɾTSifterΛ૊ΈࠐΜͩݪҼ਍அγεςϜΛ࣮ݱ͢Δ
    ɾTSifterΛChaos EngineeringʹԠ༻͢Δ
    ɾʮ࣮ݧʯޙʹࣄલͷԾઆ(known-unknowns)ͱҟͳΔݱ৅͕ൃੜ͠
    ͨͱ͖ʹɺͦͷݱ৅Λ਍அ͢ΔͨΊʹɺTSifterΛ࢖͏
    ɾTSifter͸શͯͷϝτϦοΫΛ૸ࠪ͢ΔͨΊɺະ஌ͷݱ৅
    ʢunknown-unknownsʣʹରॲ͠΍͍͢
    ࠓޙͷల๬

    View Slide