Slide 1

Slide 1 text

TSifter: ϚΠΫϩαʔϏεʹ͓͚Δੑೳҟৗͷ ਝ଎ͳ਍அʹ޲͍ͨ࣌ܥྻσʔλͷ࣍ݩ࡟ݮख๏ ௶಺ ༎थʢ͘͞ΒΠϯλʔωοτɺژ౎େֶʣ ௽ా തจʢ͘͞ΒΠϯλʔωοτʣ ݹ઒ խେʢ͸ͯͳʣ ৘ใॲཧֶձ ୈ13ճΠϯλʔωοτͱӡ༻ٕज़γϯϙδ΢ϜʢIOTS2020ʣ 2020೥12݄3೔

Slide 2

Slide 2 text

2 1. ݚڀͷഎܠͱ໨త 2. ҟৗͷݪҼ਍அʹ޲͍ͨϝτϦοΫͷ࣍ݩ࡟ݮख๏ 3. ࣮ݧͱධՁ 4. ·ͱΊͱࠓޙͷల๬ ໨࣍

Slide 3

Slide 3 text

1. ݚڀͷഎܠͱ໨త

Slide 4

Slide 4 text

4 ϚΠΫϩαʔϏεߏ੒ͷීٴ ϞϊϦε ϚΠΫϩαʔϏε ػೳผͷ෼ࢄߏ੒΁ มભ WebαʔϏεͷιϑτ΢ΣΞن໛͕૿େ͠ɺ։ൃऀ͕ιϑτ΢ΣΞΛ มߋ͢Δ͜ͱ͕೉͘͠ͳ͍ͬͯΔ

Slide 5

Slide 5 text

5 ؂ࢹσʔλྔ ͷ૿େ ϚΠΫϩαʔϏεΛӡ༻͢Δࡍͷ໰୊ҙࣝ ґଘؔ܎ͷෳࡶੑ ιϑτ΢ΣΞͷ มߋස౓޲্ γεςϜͷೝ஌ෛՙ͕ߴ·Δ ੑೳҟৗͷݪҼΛ਍அ͢ΔͨΊͷ࣌ؒΛཁ͢ΔΑ͏ʹͳΔ

Slide 6

Slide 6 text

6 ੑೳҟৗΛ਍அ͢ΔͨΊͷطଘͷΞϓϩʔν ϝτϦοΫ ςΩετϩά ࣮ߦτϨʔε ๛෋ͳ৘ใΛ΋͕ͭϩάʹग़ྗ͞Εͳ͍΋ͷ΋ ͋Δ ॲཧܦ࿏ͷล୯Ґͷεϧʔϓοτ΍࣮ߦ࣌ؒΛ ೺ѲͰ͖ΔɻΞϓϦέʔγϣϯʹܭଌॲཧΛઃ ఆ͢Δख͕ؒ͋Δ ݸʑͷ৘ใྔ͸গͳ͍͕ऩूɺอଘɺՄࢹԽ͠ ΍͍͢ɻ ࣮؀ڥ΁ͷద༻ੑΛ౿·͑ͯɺʮϝτϦοΫʯʹண໨

Slide 7

Slide 7 text

7 ϝτϦοΫϕʔεΞϓϩʔν ֤αʔϏεͷܥྻάϥϑ͔Β૬ؔΛൃݟͰ͖Δ͕ɺݪҼՕॴ͕ෆ໌ ʮ౷ܭతҼՌ୳ࡧʯΛԠ༻ͨ͠Ξϓϩʔν͕௚ۙ਺೥ͰఏҊ͞Ε͍ͯΔ Service A response time Service D response time Service E response time Service A response time Service D response time Service E response time Service F response time Service C response time ᶃܥྻؒͷҼՌ఻ൖάϥϑͷߏங ᶄҼՌͷܦ࿏ͷਪ࿦ Ma, M.,et al., AutoMAP: Diagnose Your Microservice-based Web Applications Automatically, WWW2020. Qiu, J.,et al., A Causality Mining and Knowledge Graph Based Method of Root Cause Diagnosis for Performance Anomaly in Cloud Applications, Applied Sciences, 2020. Lin, J.,et al., Microscope: Pinpoint Performance Issues with Causal Graphs in Micro-Service Environments, ICSOC2018. Service A response time Service F response time Service C response time Top-1 Top-2 Service B response time

Slide 8

Slide 8 text

ɾ਍அʹར༻͢ΔϝτϦοΫͷछྨͷ૊߹ͤ͸ݻఆ ʢ1ʙ7ݸఔ౓ʣ ɾྫʣԠ౴஗ԆͷΈɺ{Ԡ౴஗Ԇ, CPUར༻཰, ϝϞϦ࢖༻ྔ,…} ͳͲ ɾΑΓݪҼʹ͍ۙϝτϦοΫ͕݁Ռ͔Βআ֎͞ΕΔՄೳੑ͕͋Δ 8 ϝτϦοΫϕʔεΞϓϩʔνͷ՝୊ Ͱ͖ΔݶΓଟ͘ͷϝτϦοΫͷܥྻΛ୳ࡧ͢Δඞཁ͕͋Δ TCPͷ࠶ૹΤϥʔ͕ൃੜ͍ͯ͠Δ͕ɺ ωοτϫʔΫଳҬͷมԽྔ͕খ͍͞ͳͲ

Slide 9

Slide 9 text

9 ੑೳҟৗʹର͢ΔϝτϦοΫͷܥྻͷ࣍ݩ࡟ݮͷఏҊ ໨త: ϚΠΫϩαʔϏεʹͯҟৗͷ఻೻ܦ࿏ΛࣗಈͰਪ࿦͢ΔͨΊͷج൫ ఏҊ: ҟৗͷݕ஌ʹ൓Ԡͯ͠ɺʮҰ࣌తʹʯ਍அʹ༗༻ͳܥྻΛશܥྻ͔ Βߴ଎ʹநग़͢Δ࣍ݩ࡟ݮख๏ “TSifter” (Time series Sifter) ɾᶃਖ਼֬ੑ :਍அʹ༗༻ͳܥྻ͕࡟ݮ͞Ε͍ͯͳ͍ ɾᶄ࣍ݩ࡟ݮ཰: ແ༻ͳܥྻΛͳΔ΂͘ଟ͘࡟ݮ͍ͨ͠ ɾᶅߴ଎ੑ : ਝ଎ʹݪҼΛΈ͚͍ͭͨ (ཧ૝͸1෼ఔ౓) ܥྻ਺ʢ=࣍ݩ਺ʣ͕૿Ճ͢ΔͱҼՌ఻ൖάϥϑ͕ڊେԽ͢Δ 3ͭͷཁ݅

Slide 10

Slide 10 text

10 ࠷ऴతʹ࣮ݱ͍ͨ͠ݪҼ਍அγεςϜͷશମ૾ શܥྻ औಘ ܥྻͷ ࣍ݩ࡟ݮ ݪҼ਍அ ࣌ܥྻ σʔλϕʔε ܥྻͷऩू ҟৗݕ஌ ఏҊख๏ͷείʔϓ YES Service A/ req_errors Service D/ connections Service E/ ܥྻؒͷҼՌͷܦ࿏ ᶃ ᶄ ᶅ ᶆ ᶇ ʢҼՌάϥϑߏஙʣ αʔϏε୯ҐͰ ࣍ݩ࡟ݮ

Slide 11

Slide 11 text

2. ੑೳҟৗͷݪҼ਍அʹ޲͍ͨ ϝτϦοΫͷ࣍ݩ࡟ݮख๏

Slide 12

Slide 12 text

12 ఏҊख๏ TSifter ͷཁ݅ͱղܾ ᶃਖ਼֬ੑ ᶄ࣍ݩ࡟ݮ཰ ᶅߴ଎ੑ ಎ࡯1 ಎ࡯2 ҟৗൃੜલޙͰ࣌ܥྻͷ܏޲͕มԽ͠ͳ͍ ܥྻ͸਍அ࣌ʹෆཁ → ࣌ܥྻσʔλͷఆৗੑΛ΋ͭܥྻΛআ֎ ࣌ܥྻάϥϑͷܗঢ়͕ࣅ͍ͯΔܥྻ܈͸ҟ ৗͷ਍அ࣌ʹ৑௕ → αʔϏε୯ҐͰ࣌ܥྻͷΫϥελϦϯά ܥྻ਺nʹରͯ͠ΫϥελϦϯάॲཧ͸ , ... → ಎ࡯1ͷআ֎ॲཧ Λઌʹ࣮ߦ͢Δ O(kn) O(n2) O(n)

Slide 13

Slide 13 text

13 TSifter: 2ஈ֊ͷ࣍ݩ࡟ݮख๏ ɾɾɾ ɾɾɾ ɾɾɾ εςοϓ1 ఆৗੑΛ΋ͭ ܥྻΛআڈ ੜͷܥྻ ඇఆৗͳܥྻ ΫϥελԽ͞Εͨܥྻ ୅දܥྻ ҟৗظؒ ΫϥελϦϯάޙʹΫϥελ ͷ୅දܥྻΛબ୒ εςοϓ2 ྨࣅͷܗঢ়Λ ͱΔܥྻΛ ΫϥελϦϯά ҟৗൃੜલn෼ͷ ݻఆ௕ͷ΢Οϯυ΢෯

Slide 14

Slide 14 text

14 ɾఆৗੑ: σʔλͷਫ४΍͹Β͖ͭɺࣗݾ૬ؔͷؔ܎͕࣌఺ʹΑΒͣҰఆ ɾ࣌ܥྻσʔλͷఆৗੑݕఆʹ޿͘ར༻͞ΕΔADFݕఆΛར༻ ɾશͯͷܥྻΛ1ͭͣͭݕఆ͠ɺఆৗੑΛ΋ͭܥྻΛআڈ εςοϓ1: ݸʑͷܥྻͷఆৗੑʹண໨ ࢒ཹ͢Δඇఆৗͳܥྻͷྫ

Slide 15

Slide 15 text

15 εςοϓ2: ܥྻؒͷܗঢ়ྨࣅੑʹண໨ αʔϏε಺ͷܥྻ܈ ܗঢ়ͷྨࣅੑΛද͢ڑ཭ई౓ shape-based distance (SBD) Λ࠾༻ ʢ࣌ؒ࣠ํ޲ʹγϑτɺॎ࣠ʹ৳ॖ͍ͯͯ͠΋ྨࣅͱΈͳ͢ʣ Paparrizos, J. and Gravano, L., k-Shape: Efficient and Accurate Clustering of Time Series,(SIGMOD2015) ߴ଎ԽͷͨΊɺ1ճͷॲཧͰΫϥελ਺ΛܾఆՄೳͳ֊૚తΫϥελϦϯάΛ࠾༻ ʢ ͕ͩɺ1αʔϏε͋ͨΓͷܥྻ਺͕খ͍ͨ͞Ί໰୊ʹͳΒͳ͍ʣ O(n2) αʔϏεͷ୅දܥྻ܈ Ϋϥελ ୅දܥྻͷબ୒ ଞͷܥྻͱͷڑ཭ͷ૯࿨͕࠷খͷܥྻ

Slide 16

Slide 16 text

3. ࣮ݧͱධՁ

Slide 17

Slide 17 text

17 ࣮ݧ؀ڥ ੍ޚαʔό Locust Kubernetes CPUෛՙ஫ೖ ωοτϫʔΫ஗Ԇ஫ೖ ϚΠΫϩαʔϏεΫϥελ Front-End Catalogue Orders Payment Shipping User Carts ղੳαʔό Prometheus ֎෦ෛՙͷ ੜ੒ ܥྻऔಘϞδϡʔϧ stress-ng tc ղੳϞδϡʔϧ ܥྻऩूִؒ: 5ඵ ܥྻͷ΢Οϯυ΢෯: 30෼ Intel Xeon 3.10GHz, 8core,32GB ܥྻͷऩूɾอଘ Sock Shop

Slide 18

Slide 18 text

18 ϕʔεϥΠϯख๏: Sieve ɾεςοϓ1: ෼ࢄ஋ͷখ͍͞ϝτϦοΫΛऔΓআ͘ ɾεςοϓ2: k-ShapeʹΑΔΫϥελϦϯά Thalheim, J., et al., Sieve: Actionable Insights from Monitored Metrics in Distributed Systems, (Middleware 2017) ߃ৗతʹར༻ՄೳͳγεςϜͷಛ௃Λநग़͢Δ͜ͱ͕໨తͰ͋Γɺຊ ݚڀͱ͸໨త͕ҟͳΔ͕ɺҟͳΔ໨తʹ΋Ԡ༻Ͱ͖ΔՄೳੑ͕͋Δ ࣌ܥྻσʔλͷ࣍ݩ࡟ݮख๏ Paparrizos, J. and Gravano, L., k-Shape: Efficient and Accurate Clustering of Time Series,(SIGMOD2015)

Slide 19

Slide 19 text

19 ᶃਖ਼֬ੑ: ҟৗ͝ͱͷݪҼͱͳΔܥྻͷਖ਼ޡ TSifter͸શͯͷέʔεʹରͯ͠ਖ਼͘͠ݪҼͱͳΔܥྻΛநग़ ϕʔεϥΠϯख๏͸shippingαʔϏεͷCPUաෛՙͷέʔεͷΈෆਖ਼ղ

Slide 20

Slide 20 text

20 ᶄ࣍ݩ࡟ݮ཰ͷධՁ: ҟৗ4έʔε ɾ͍ͣΕͷέʔεʹ͓͍ͯ΋ɺ91%Ҏ্ͷ࣍ݩ࡟ݮ཰Ͱ͋Γɺ1/10Ҏ ԼʹߜΓࠐΊ͍ͯΔ ɾϕʔεϥΠϯख๏ͷ΄͏͕࣍ݩ࡟ݮ཰͸Θ͔ͣʹߴ͍ ɾTSifter͸εςοϓ1ͰΑΓଟ͘ͷϝτϦοΫΛ࡟ݮͰ͖͍ͯΔ

Slide 21

Slide 21 text

21 ᶅߴ଎ੑͷධՁ: ֤ॲཧεςοϓͷ࣮ߦ࣌ؒ ɾCPUίΞ਺4ɺϝτϦοΫ਺100kͷ؀ڥ ɾTSifter͸ϕʔεϥΠϯʹରͯ͠ɺ311ഒߴ଎ͱͳͬͨ ɾʢޙड़ͷ௥Ճ࣮ݧͰ͸ɺ࠷௿Ͱ΋270ഒߴ଎ʣ εςοϓ1 (sec) ࣄલআڈ εςοϓ2 (sec) ΫϥελϦϯά ߹ܭ࣮ߦ࣌ؒ (sec) TSifter 54.41 8.68 63.09 ϕʔεϥΠϯ 32.33 19590.83 19623.16

Slide 22

Slide 22 text

22 ɾ྆ख๏ͱ΋ʹɺCPUίΞ਺·ͨ͸ܥྻ਺ʹରͯ͠ɺઢܗʹεέʔϧ ᶅߴ଎ੑͷධՁ: εέʔϥϏϦςΟ TSifter ϕʔεϥΠϯ 0 20 40 60 20000 40000 60000 80000 100000 Execution time (sec) Number of metrics Clustering 1.21 2.43 3.81 5.72 8.68 Filtering 10.24 20.28 31.05 42.14 54.41 Total 11.45 22.71 34.86 47.86 63.09 0 5000 10000 15000 20000 20000 40000 60000 80000 100000 Execution time (sec) Number of metrics Clustering 3908.10 7773.00 11710.26 15670.81 19590.83 Filtering 2.88 7.63 13.54 22.91 32.33 Total 3910.98 7780.63 11723.80 15693.72 19623.16 0 200 400 600 800 1000 1200 1400 1 2 3 4 Execution time (sec) Number of CPU cores Clustering 1224.87 613.31 416.55 317.65 Filtering 0.17 0.17 0.17 0.17 Total 1225.04 613.48 416.72 317.82 0 1 2 3 4 1 2 3 4 Execution time (sec) Number of CPU cores Clustering 0.37 0.21 0.20 0.15 Filtering 3.57 1.81 1.26 0.99 Total 3.93 2.02 1.46 1.14 TSifter ϕʔεϥΠϯ

Slide 23

Slide 23 text

23 ࣮ߦ࣌ؒ͸1෼Ҏ಺͕ཧ૝Ͱ͋ΓɺϕʔεϥΠϯख๏ͷ࣮ߦ ࣌ؒ͸1225ඵʢ20෼ʣͰ͋Γɺݱ৔Ͱͷཁ݅Λຬͨͤͳ͍ ֤ཁ݅ʹର͢ΔධՁͷ·ͱΊ ᶃਖ਼֬ੑ ᶄ࣍ݩ ࡟ݮ཰ ᶅߴ଎ੑ ࣮ݧͰ͸ɺαʔϏεͷछྨ΍ނোέʔε͕ݶఆతͳͨΊɺ ௥ՃͷධՁ͕ඞཁ ࣍ݩ࡟ݮ཰͸ϕʔεϥΠϯख๏͕Θ͔ͣʹ্ճΔ ࠷ऴతʹཁٻ͞ΕΔ࣍ݩ࡟ݮ཰ͷఔ౓͸ࠓޙͷ՝୊ CPUίΞ਺ͱܥྻ਺͕มԽͯ͠΋ɺ྆ख๏ͷ࣮ߦ࣌ؒൺ͸ಉ ౳

Slide 24

Slide 24 text

24 ͳͥϕʔεϥΠϯख๏ʹରͯ͠ߴ଎ͳͷ͔ʁ ϕʔεϥΠϯ TSifter ࠷దͳΫϥελ਺Λܾఆ͢ΔͨΊʹ ܁Γฦ࣮͠ߦ ΫϥελϦϯά࣮ߦճ਺͸310ճ ֊૚తΫϥελϦϯά ΫϥελϦϯά ࣮ߦճ਺͸7ճ ڑ཭ͷᮢ஋Λઃఆͯ͠ Ϋϥελ਺Λܾఆ

Slide 25

Slide 25 text

4. ·ͱΊͱࠓޙͷల๬

Slide 26

Slide 26 text

26 ɾҟৗͷݕ஌ʹ൓Ԡͯ͠ɺେྔͷϝτϦοΫ͔ΒʮҰ࣌తʹʯ਍அʹ༗༻ͳ ϝτϦοΫΛߴ଎ʹநग़͢ΔͨΊͷ࣍ݩ࡟ݮख๏ΛఏҊ ɾ࣮ݧͷൣғ಺Ͱ͸ɺϕʔεϥΠϯʹରͯ͠ɺ࠷௿Ͱ΋270ഒͷߴ଎ԽΛୡ੒ ɾਖ਼֬ੑɺ࣍ݩ࡟ݮ཰ɺεέʔϥϏϦςΟͰ͸ಉ౳ఔ౓ ɾ10 ສϝτϦοΫʹରͯ͠1෼ఔ౓ͷ࣌ؒͰ࣮ߦՄೳ ·ͱΊͱࠓޙͷల๬ ɾࠓޙͷల๬ ɾఏҊͷྑ͕͞ΑΓ໌֬ͱͳΔධՁͷ௥ՃʢΑΓదͨ͠ϕʔεϥΠϯͷબ୒ ͳͲʣ ɾTSifterΛ૊ΈࠐΜͩݪҼ਍அγεςϜͷ࣮ݱ

Slide 27

Slide 27 text

0. ิ଍εϥΠυ

Slide 28

Slide 28 text

28 TSifterͷ੍໿ ɾ෼ੳظ͕ؒݻఆ஋Ͱ͋ΔͨΊɺ෼ੳظؒ֎ͷมಈΛߟྀͰ͖ͳ͍ ࣌ؒ࣠