Upgrade to Pro — share decks privately, control downloads, hide ads and more …

分散アプリケーションの異常の原因を即時に診断するための手法の構想 / Causality Tracing in Distributed Applications

分散アプリケーションの異常の原因を即時に診断するための手法の構想 / Causality Tracing in Distributed Applications

Tweet

More Decks by Yuuki Tsubouchi (yuuk1)

Other Decks in Research

Transcript

  1. 6 ɾϦΫΤετ୯ҐͷτϨʔγϯάʹΑΔࠜຊݪҼ෼ੳ[1] ɾΞϓϦέʔγϣϯʹܭଌίʔυΛ௥Ճ͠ͳ͚Ε͹ͳΒͳ͍ ɾߏ੒ཁૉؒͷґଘάϥϑͱϝτϦοΫΛར༻ͨࠜ͠ຊݪҼ෼ੳ[2][3] ɾґଘؔ܎ͷมԽ΍ϫʔΫϩʔυͷมԽ͕͋Δͱɺґଘؔ܎ͷநग़ॲ ཧΛ΍Γ௚͞ͳ͚Ε͹ͳΒͳ͍ => ଈ࣌ੑΛຬͨ͞ͳ͍ ɾ·ͨ͸ɺ֤ߏ੒ཁૉ͝ͱʹαʔϏεϨϕϧΛ୅ද͢ΔࢦඪʢSLOʣ Λઃܭ͠ɺ؂ࢹ͢ΔͨΊͷख͕ؒ͋Δ

    ઌߦݚڀͷ՝୊ [1]: H. Jayathilaka, C. Krintz and R. Wolski, Performance monitoring and root cause analysis for cloud-hosted web applications, WWW, pp. 469–478, 2017. [2]: J. Thalheim, A. Rodrigues, I. Akkus, and others, Sieve: Actionable Insights from Monitored Metrics in Distributed Systems, ACM/IFIP/USENIX Middleware, pp.14-27 2017. [3]: J. Lin, C. Pengfei, and Z. Zibin, "Microscope: Pinpoint performance issues with causal graphs in micro-service environments." ICSO, pp.3-20, 2018.
  2. 7 1. ௿ϊΠζͰ͋Δ͜ͱͱߏ੒ཁૉ୯ҐͰࢦඪΛઃܭ͢Δखؒͷ௿ݮ ↪ αʔϏεΛ୅ද͢ΔࢦඪͷΈΛ؂ࢹ (Ԡ౴࣌ؒ΍Ԡ౴Τϥʔ཰ͳͲ) ؂ࢹΞϥʔτΛܖػʹTCP઀ଓґଘάϥϑͱ֤ϊʔυ্ͷϝτϦο ΫΛ୳ࡧ ୅දࢦඪͱ࣌ܥྻతಛ௃͕૬ؔ͢Δ΋ͷΛݪҼͷީิͱ͢Δ 2.

    ଈ࣌ੑ ֤ϊʔυ্ͷશϝτϦοΫΛ෼ੳ͢ΔͱͳΔͱॲཧ͕஗͘ͳΔ ↪ ϝτϦοΫऔಘॲཧΛ֤ϊʔυʹ෼ࢄอ࣋ɾ໰͍߹ΘͤʹΑΓߴ ଎Խ ↪ ࣌ܥྻͷ૬ؔ෼ੳॲཧΛGPGPUʹΑΓߴ଎Խ ↪ աڈͷॲཧ݁ՌΛ࠶ར༻͠ɺߴ଎൑ఆ ݚڀͷ໨త
  3. 11 ؔ࿈ݚڀᶄ: Sieve B. H. Sigelman, et al., Dapper, a

    Large-Scale Distributed Systems Tracing Infrastructure, Technical report, Google 2010. ɾطଘͷ؂ࢹج൫͔ΒΞΫγϣϯՄೳͳώϯτΛಘΔγεςϜΛఏҊ ɾ࣌ܥྻΫϥελ෼ੳʹΑΓίϯϙʔωϯτؒͷґଘؔ܎Λࣝผ͢Δ ɾ՝୊: ߏ੒มߋ΍ϫʔΫϩʔυͷมԽʹ௥ै͢ΔͨΊʹ͸ɺ෼ੳε ςοϓΛ࠷ॳ͔Β΍Γ௚͢ඞཁ͕͋Δ [2]: J. Thalheim, A. Rodrigues, I. Akkus, and others, Sieve: Actionable Insights from Monitored Metrics in Distributed Systems, ACM/IFIP/USENIX Middleware, pp.14-27 2017.
  4. 15 ఏҊख๏ͷલఏ ɾؔ࿈ݚڀᶅ MicroscopeͷҼՌਪ࿦ख๏Λϕʔεʹ͢Δ ɾαʔϏεશମͷࢦඪͷΈΛఆٛ͠ɺ؂ࢹΞϥʔτΛઃఆ͢Δ ɾϗετ্ͷ͢΂ͯͷϝτϦοΫ ɾϓϩηεΛϊʔυɺTCP઀ଓΛΤοδͱͨ͠׬શͳωοτϫʔΫґ ଘάϥϑ[4] Λར༻ͯ͠ɺҟৗͷҼՌؔ܎άϥϑΛߏங͢Δ [4]

    ௶಺༎थ, ݹ઒խେ, দຊ྄հ, “Transtracer: ෼ࢄγεςϜʹ͓͚ΔTCP/UDP௨৴ͷऴ୺఺ͷ؂ࢹʹΑΔϓϩηεؒґଘؔ܎ͷࣗಈ௥ ੻”, Πϯλʔωοτͱӡ༻ٕज़γϯϙδ΢Ϝ࿦จू, 2019, 64-71 (2019-11-28), 2019೥12݄.
  5. 16 ఏҊख๏ͷϫʔΫϑϩʔʢਤʣ reqs/sec errors/sec latency CPU usage … ݪҼީิϦετ Frontend

    Component ֤ϊʔυ͸ ෳ਺ͷϝτϦοΫΛ΋ͭ 1. ୅දࢦඪͷ ҟৗݕ஌ 2. ґଘάϥϑ͔Βҟৗ ͷҼՌؔ܎άϥϑΛߏங (component, metric, score) (component, metric, score) . . .