Slide 1

Slide 1 text

TSifter: ϚΠΫϩαʔϏεʹ͓͚Δੑೳҟৗͷ ਝ଎ͳ਍அʹ޲͍ͨ࣌ܥྻσʔλͷ࣍ݩ࡟ݮख๏ ௶಺ ༎थ(@yuuk1t), ௽ా തจ(@tsurubee3), ݹ઒ խେ(@yoyogidesaiz) @͸ͯͳ 2020/11/24

Slide 2

Slide 2 text

2 SLIɾSLOΛத৺ʹਾ͑ͨγεςϜҟৗ΁ͷΞϓϩʔν ɾ༧ଌɹաڈʹSLI͕௿Լͨ͠௚લͷৼΔ෣͍ͱྨࣅ͢ΔৼΔ෣͍Λൃݟ ɾҟৗՕॴͷಛఆɹSLOҧ൓࣌ʹҟৗՕॴΛ୳ࡧ ɾݪҼڀ໌ɹʢ޻ࣄதʣ ɾճ෮ɹSLIΛλʔήοτɺใुͱ͢ΔϑΟʔυόοΫ੍ޚɺڧԽֶश ɾSLIͷ୅ସɹࠓ͋ΔϝτϦοΫΛ૊Έ߹ΘͤͯSLIͷ୅ସࢦඪΛ࡞੒ ༧ଌ ҟৗՕॴͷಛఆ ݪҼڀ໌ ճ෮ ݕ஌ SLOʹجͮ͘Ξϥʔτ ௚ۙͷڵຯର৅

Slide 3

Slide 3 text

3 1. ݚڀͷഎܠͱ໨త 2. ҟৗͷݪҼ਍அʹ޲͍ͨϝτϦοΫͷ࣍ݩ࡟ݮख๏ 3. ࣮ݧͱධՁ 4. ·ͱΊͱࠓޙͷల๬ ໨࣍

Slide 4

Slide 4 text

1. ݚڀͷഎܠͱ໨త

Slide 5

Slide 5 text

5 ϚΠΫϩαʔϏεߏ੒ͷීٴ ϞϊϦγοΫ ϚΠΫϩαʔϏε

Slide 6

Slide 6 text

6 ؍ଌσʔλྔ ͷ૿େ ϚΠΫϩαʔϏεʹΑΔ෼ࢄԽʹ·ͭΘΔ໰୊ҙࣝ ґଘؔ܎ͷෳࡶੑ ιϑτ΢ΣΞͷ ಈతͳมߋ γεςϜͷೝ஌ෛՙ͕ߴ·Δ ҟৗͷݪҼΛ਍அ͢ΔͨΊͷ࣌ؒΛཁ͢ΔΑ͏ʹͳΔ

Slide 7

Slide 7 text

7 ੑೳҟৗΛ਍அ͢ΔͨΊͷطଘͷΞϓϩʔν ϝτϦοΫ ςΩετϩά ࣮ߦτϨʔε ๛෋ͳ৘ใΛ΋͕ͭϩάʹग़ྗ͞Εͳ͍΋ͷ΋ ͋Δ ॲཧܦ࿏ͷล୯Ґͷεϧʔϓοτ΍࣮ߦ࣌ؒΛ ೺ѲͰ͖Δ͕ɺΞϓϦέʔγϣϯʹܭଌॲཧΛ ઃఆ͢Δख͕ؒ͋Δ ݸʑͷ৘ใྔ͸গͳ͍͕ऩूɺอଘɺՄࢹԽ͠ ΍͍͢ɻ ࣮؀ڥ΁ͷద༻ੑΛ౿·͑ͯɺʮϝτϦοΫʯʹண໨

Slide 8

Slide 8 text

8 1. ϚΠΫϩαʔϏεؒΛԣஅ͢ΔܥྻؒͷҼՌͷ఻ൖϞσϧΛߏங※ 2. ϚΠΫϩαʔϏε୯ҐͷҼՌͷܦ࿏Λਪ࿦͢Δ ϝτϦοΫϕʔεΞϓϩʔν ҟৗͷݕ஌ޙʹɺϝτϦοΫͷܥྻάϥϑΛ໨ࢹ͢Δͷ͸࣌ؒΛཁ͢Δ ख๏ ※ ௽ాതจ, άϥϑΟΧϧϞσϧʹجͮ͘ҼՌ୳ࡧख๏ͷௐࠪ, 2020. https://blog.tsurubee.tech/entry/2020/10/08/085158 Service A/ req_errors Service D/ connections Service E/ disk IOPS ҼՌͷܦ࿏

Slide 9

Slide 9 text

ɾ਍அʹར༻͢Δݻఆ਺ͷϝτϦοΫΛࢦఆ͠ͳ͚Ε͹ͳΒͳ͍ ɾྫʣԠ౴஗ԆͷΈɺ{Ԡ౴஗Ԇ,CPUར༻཰,ϝϞϦ࢖༻ྔ,…} ͳͲ ɾΑΓݪҼʹ͍ۙϝτϦοΫ͕݁Ռ͔Βআ֎͞ΕΔՄೳੑ͕͋Δ 9 ϝτϦοΫϕʔεΞϓϩʔνͷ՝୊ Ͱ͖ΔݶΓଟ͘ͷϝτϦοΫΛ෼ੳ͢Δඞཁ͕͋Δ

Slide 10

Slide 10 text

10 ੑೳҟৗʹର͢ΔϝτϦοΫͷܥྻͷ࣍ݩ࡟ݮͷఏҊ ໨త: ϚΠΫϩαʔϏεʹ͓͍ͯɺҟৗͷ఻೻ܦ࿏ΛࣗಈͰਪ࿦͢ΔͨΊ ͷج൫Λఏڙ͢Δ ఏҊ: ҟৗͷݕ஌ʹ൓Ԡͯ͠ɺʮҰ࣌తʹʯ਍அʹ༗༻ͳܥྻΛશܥྻ͔ Βߴ଎ʹநग़͢Δख๏ “TSifter” (Time series Sifter) ɾᶃਖ਼֬ੑ :਍அʹ༗༻ͳܥྻ͕࡟ݮ͞Ε͍ͯͳ͍ ɾᶄ࣍ݩ࡟ݮ཰: ແ༻ͳܥྻΛͳΔ΂͘ଟ͘࡟ݮ͍ͨ͠ ɾᶅߴ଎ੑ : ਝ଎ʹݪҼΛΈ͚͍ͭͨ (ཧ૝͸1෼ఔ౓)

Slide 11

Slide 11 text

11 ࠷ऴతʹ࣮ݱ͍ͨ͠ݪҼ਍அγεςϜͷશମ૾ શܥྻ औಘ ܥྻͷ ࣍ݩ࡟ݮ ݪҼ਍அ ࣌ܥྻ σʔλϕʔε σʔλऩू ҟৗݕ஌ ఏҊख๏ͷείʔϓ YES Service A/ req_errors Service D/ connections Service E/ ܥྻؒͷҼՌͷܦ࿏ ᶃ ᶄ ᶅ ᶆ ᶇ

Slide 12

Slide 12 text

2. ੑೳҟৗͷݪҼ਍அʹ޲͍ͨ ϝτϦοΫͷ࣍ݩ࡟ݮख๏

Slide 13

Slide 13 text

13 ఏҊख๏ TSifter ͷཁ݅ͱղܾ ᶃਖ਼֬ੑ ᶄ࣍ݩ࡟ݮ཰ ᶅߴ଎ੑ ಎ࡯1 ಎ࡯2 ҟৗൃੜલޙͰ࣌ܥྻͷ܏޲͕มԽ͠ͳ͍ ܥྻ͸਍அ࣌ʹෆཁ → ࣌ܥྻσʔλͷఆৗੑΛ΋ͭܥྻΛআ֎ ࣌ܥྻมԽͷܗঢ়͕ࣅ͍ͯΔܥྻ܈͸ҟৗ ͷ਍அ࣌ʹ৑௕ → ࣌ܥྻͷΫϥελϦϯά ܥྻ਺͕େ͖͍΄ͲΫϥελϦϯάॲཧ͕஗͍ → ಎ࡯1ͷআ֎ॲཧΛઌʹ࣮ߦ͢Δ

Slide 14

Slide 14 text

14 TSifter: 2ஈ֊ͷ࣍ݩ࡟ݮख๏ ɾɾɾ ɾɾɾ ɾɾɾ εςοϓ1 ఆৗੑΛ΋ͭ ܥྻΛআڈ ੜͷܥྻ ඇఆৗͳܥྻ ΫϥελԽ͞Εͨܥྻ ୅දܥྻ ҟৗظؒ ΫϥελϦϯάޙʹΫϥελ ͷ୅දܥྻΛબ୒ εςοϓ2 ྨࣅͷܗঢ়Λ ͱΔܥྻΛ ΫϥελϦϯά ҟৗൃੜલn෼ͷ ݻఆ௕ͷ΢Οϯυ΢෯

Slide 15

Slide 15 text

15 ɾఆৗੑ: σʔλͷฏۉ͓Αͼ෼ࢄ͕࣌ؒʹΑΒͣҰఆɼ͔ͭࣗݾڞ ෼ࢄ͕࣌ؒࠩͷΈʹґଘ͢Δੑ࣭ ɾ࣌ܥྻσʔλͷఆৗੑݕఆʹ޿͘ར༻͞ΕΔADFݕఆΛར༻͢Δ ɾશͯͷܥྻΛ1ͭͣͭݕఆ͠ɺఆৗੑΛ΋ͭܥྻΛআڈ͢Δ εςοϓ1: ݸʑͷϝτϦοΫͷఆৗੑʹண໨ ఆৗੑΛ΋ͭܥྻͷྫ

Slide 16

Slide 16 text

16 ɾ֤αʔϏε಺Ͱɺάϥϑͷܗঢ়͕ࣅ͍ͯΔܥྻΛΫϥελϦϯά ɾܗঢ়ͷྨࣅੑΛߟྀͨ͠ڑ཭ई౓ͱͯ͠ shape-based distance (SBD) Λ࠾༻ (2ͭͷܥྻΛεϥΠυͤͯ͞૬ؔΛΈΔ) ɾߴ଎ԽͷͨΊʹɺ1ճͷΫϥελϦϯάॲཧͰΫϥελ਺ΛܾఆՄ ೳͳ֊૚తΫϥελϦϯάΛ࠾༻ ɾ࠷ޙʹɺ֤Ϋϥελͷ୅දͱͳΔܥྻΛҰͭબ୒ ɾଞͷܥྻͱͷڑ཭ͷ૯࿨͕࠷খͷܥྻΛબ୒ εςοϓ2: ܥྻؒͷܗঢ়ྨࣅੑʹண໨ [Paparrizos 15]: Paparrizos, J. and Gravano, L., k-Shape: Efficient and Accurate Clustering of Time Series, The ACM Special Interest Group on Management of Data (SIGMOD), pp. 1855– 1870 2015. [Paparrizos 15]

Slide 17

Slide 17 text

17 TSifterͷ੍໿ ɾ෼ੳظ͕ؒݻఆ஋Ͱ͋ΔͨΊɺ෼ੳظؒ֎ͷมಈΛߟྀͰ͖ͳ͍ ɾྫ͑͹ɺ༧Ί෼ੳظؒΛ30෼ͱ͢Δͱɺҟৗͷݕ஌࣌ࠁͷ40෼લ ʹ܏޲͕มԽ͠͸͡Ίͯɺ30෼લ͔Β0෼લͷؒͰ͸ఆৗੑΛ΋ͭ ৔߹ɺ͜ͷܥྻ͸আڈ͞ΕΔ

Slide 18

Slide 18 text

3. ࣮ݧͱධՁ

Slide 19

Slide 19 text

19 ࣮ݧઃఆ ɾΞϓϦέʔγϣϯ: Sock Shop ɾςετϕου: GKE্ʹߏங ɾϝτϦοΫऩू: Prometheus ɾෛՙੜ੒: Locust ɾނো஫ೖ ɾCPUෛՙ: stress-ng ɾωοτϫʔΫ஗Ԇ: tc ϋʔυ΢ΣΞߏ੒͸༧ߘΛࢀর Sock Shopͷߏ੒ਤ https://microservices-demo.github.io/

Slide 20

Slide 20 text

20 ϕʔεϥΠϯख๏: Sieve [Thalheim 17] ɾεςοϓ1: ෼ࢄ஋ͷখ͍͞ϝτϦοΫΛऔΓআ͘ ɾεςοϓ2: ࣌ܥྻΫϥελϦϯάख๏k-ShapeʹΑΓΫϥελϦϯ άͨ͠ͷͪʹ୅දϝτϦοΫΛબग़͢Δ [Thalheim 17] Thalheim, J., Rodrigues, A., Akkus, I. E., Bhatotia, P., Chen, R., Viswanath, B., Jiao, L. and Fetzer, C., Sieve: Actionable Insights from Monitored Metrics in Distributed Systems, the ACM/IFIP/USENIX Middleware, pp. 14–27 2017. ߃ৗతʹར༻ՄೳͳγεςϜͷಛ௃Λநग़͢Δ͜ͱ͕໨తͰ͋Γɺຊ ݚڀͱ͸໨త͕ҟͳΔ͕ɺҟͳΔ໨తʹ΋Ԡ༻Ͱ͖ΔՄೳੑ͕͋Δ

Slide 21

Slide 21 text

21 ਖ਼֬ੑ: ނোέʔε͝ͱͷݪҼͱͳΔϝτϦοΫͷਖ਼ޡ TSifter͸શͯͷέʔεʹରͯ͠ਖ਼͘͠ݪҼͱͳΔϝτϦοΫΛநग़ͨ͠ ϕʔεϥΠϯख๏͸shippingαʔϏεͷCPUաෛՙͷέʔεͷΈෆਖ਼ղͱͳͬͨ

Slide 22

Slide 22 text

22 ࣍ݩ࡟ݮ཰ͷධՁ ɾ͍ͣΕͷέʔεʹ͓͍ͯ΋ɺ91%Ҏ্ͷ࣍ݩ࡟ݮ཰Ͱ͋Γɺ1/10Ҏ ԼʹߜΓࠐΊ͍ͯΔ ɾϕʔεϥΠϯख๏ͷ΄͏͕࣍ݩ࡟ݮ཰͸Θ͔ͣʹߴ͍ ɾTSifter͸εςοϓ1ͰΑΓଟ͘ͷϝτϦοΫΛ࡟ݮͰ͖͍ͯΔ

Slide 23

Slide 23 text

23 ɾTSifterͷ࣮ߦ࣌ؒ͸ϕʔεϥΠϯͷ270ഒҎ্Ͱ͋ͬͨ ɾ͍ͣΕͷख๏΋CPUίΞʹର࣮ͯ͠ߦ࣌ؒ͸εέʔϧͨ͠ ߴ଎ੑͷධՁ: CPUίΞ਺ʹର͢Δ࣮ߦ࣌ؒ 0 200 400 600 800 1000 1200 1400 1 2 3 4 Execution time (sec) Number of CPU cores Clustering 1224.87 613.31 416.55 317.65 Filtering 0.17 0.17 0.17 0.17 Total 1225.04 613.48 416.72 317.82 0 1 2 3 4 1 2 3 4 Execution time (sec) Number of CPU cores Clustering 0.37 0.21 0.20 0.15 Filtering 3.57 1.81 1.26 0.99 Total 3.93 2.02 1.46 1.14 TSifter ϕʔεϥΠϯ

Slide 24

Slide 24 text

24 ɾ͍ͣΕͷख๏΋ϝτϦοΫ਺͕૿େʹରͯ͠ઢܗʹεέʔϧͨ͠ ߴ଎ੑͷධՁ: ϝτϦοΫ਺ʹର͢Δ࣮ߦ࣌ؒ TSifter ϕʔεϥΠϯ 0 20 40 60 20000 40000 60000 80000 100000 Execution time (sec) Number of metrics Clustering 1.21 2.43 3.81 5.72 8.68 Filtering 10.24 20.28 31.05 42.14 54.41 Total 11.45 22.71 34.86 47.86 63.09 0 5000 10000 15000 20000 20000 40000 60000 80000 100000 Execution time (sec) Number of metrics Clustering 3908.10 7773.00 11710.26 15670.81 19590.83 Filtering 2.88 7.63 13.54 22.91 32.33 Total 3910.98 7780.63 11723.80 15693.72 19623.16

Slide 25

Slide 25 text

25 ࣮ߦ࣌ؒ͸1෼Ҏ಺͕ཧ૝Ͱ͋ΓɺϕʔεϥΠϯख๏ͷ࣮ߦ ࣌ؒ͸1225ඵʢ20෼ʣͰ͋Γɺݱ৔Ͱͷཁ݅Λຬͨͤͳ͍ ֤ཁ݅ʹର͢ΔධՁͷ·ͱΊ ᶃਖ਼֬ੑ ᶄ࣍ݩ ࡟ݮ཰ ᶅߴ଎ੑ ࣮ݧͰ͸ɺαʔϏεͷछྨ΍ނোέʔε͕ݶఆతͳͨΊɺ ௥ՃͷධՁ͕ඞཁ ࣍ݩ࡟ݮ཰͸ϕʔεϥΠϯख๏͕Θ͔ͣʹ্ճΔɻ ݪҼ਍அγεςϜͱͯ͠Ͳͷఔ౓ͷ࣍ݩ࡟ݮ཰͕ཁٻ͞ΕΔ ͔͸ࠓޙͷ՝୊

Slide 26

Slide 26 text

26 ɾϕʔεϥΠϯख๏͸ɺ࠷దͳΫϥελ਺Λܾఆ͢ΔͨΊʹɺ Ϋϥε λ਺ΛมԽͤ͞ͳ͕Βɺ܁Γฦ͠k-ShapeΞϧΰϦζϜΛ࣮ߦ͢Δ ɾΫϥελϦϯάͷ࣮ߦճ਺͕ෳ਺ճ ɾTSifterͷ֊૚తΫϥελϦϯά͸ɺΫϥελϦϯά࣮ߦޙʹڑ཭ͷ ᮢ஋Λ༻͍ͯΫϥελ਺ΛܾఆͰ͖Δ ɾΫϥελϦϯάͷ࣮ߦճ਺͸1 ϕʔεϥΠϯʹରͯ͠ߴ଎ͱͳΔཧ༝

Slide 27

Slide 27 text

4. ·ͱΊͱࠓޙͷల๬

Slide 28

Slide 28 text

28 ɾେྔͷϝτϦοΫ͔Β਍அʹ༗༻ͳϝτϦοΫΛநग़͢ΔͨΊͷ࣍ ݩ࡟ݮख๏ΛఏҊͨ͠ ɾ࣮ݧͷൣғ಺Ͱ͸ɺϕʔεϥΠϯख๏ʹରͯ͠ɺ࠷௿Ͱ΋270ഒͷ ߴ଎ͱͳͬͨ ɾਖ਼֬ੑɺ࣍ݩ࡟ݮ཰ɺεέʔϥϏϦςΟͰ͸ಉ౳ఔ౓Ͱ͋Δ ɾ10 ສϝτϦοΫʹରͯ͠1෼ఔ౓ͷ࣌ؒͰ࣮ߦՄೳͰ͋Δ ·ͱΊͱࠓޙͷల๬

Slide 29

Slide 29 text

29 ɾTSifterΛ૊ΈࠐΜͩݪҼ਍அγεςϜΛ࣮ݱ͢Δ ɾTSifterΛChaos EngineeringʹԠ༻͢Δ ɾʮ࣮ݧʯޙʹࣄલͷԾઆ(known-unknowns)ͱҟͳΔݱ৅͕ൃੜ͠ ͨͱ͖ʹɺͦͷݱ৅Λ਍அ͢ΔͨΊʹɺTSifterΛ࢖͏ ɾTSifter͸શͯͷϝτϦοΫΛ૸ࠪ͢ΔͨΊɺະ஌ͷݱ৅ ʢunknown-unknownsʣʹରॲ͠΍͍͢ ࠓޙͷల๬