Upgrade to Pro — share decks privately, control downloads, hide ads and more …

分散システム内のプロセス間の関係性に着目したObservabilityツールの設計と実装 / Transtracer CNDK2019

分散システム内のプロセス間の関係性に着目したObservabilityツールの設計と実装 / Transtracer CNDK2019

CloudNative Days KANSAI 2019

Yuuki Tsubouchi (yuuk1)

November 28, 2019
Tweet

More Decks by Yuuki Tsubouchi (yuuk1)

Other Decks in Research

Transcript

  1. ͘͞ΒΠϯλʔωοτ גࣜձࣾ (C) Copyright 1996-2019 SAKURA internet Inc ͘͞ΒΠϯλʔωοτ ݚڀॴ

    ෼ࢄγεςϜ಺ͷϓϩηεؒͷؔ܎ੑʹ ண໨ͨ͠Observabilityπʔϧͷઃܭͱ࣮૷ ͘͞ΒΠϯλʔωοτݚڀॴ Yuuki Tsubouchi / @yuuk1t CloudNative Days Kansai 2019 2019.11.28
  2. 2 ࣗݾ঺հ Yuuki Tsubouchi / Ώ͏͏͖ https://yuuk.io/ ܦྺ גࣜձࣾ͸ͯͳ WebΦϖϨʔγϣϯΤϯδχΞɾSRE

    ͘͞ΒΠϯλʔωοτגࣜձࣾ ͘͞ΒΠϯλʔωοτݚڀॴ ݚڀһ WebαʔϏεͷ ։ൃɾӡ༻ Πϯλʔωοτ ج൫ٕज़ݚڀ 5೥ ݱࡏ Site Reliability Engineering(SRE) Researcher @yuuk1t ৘ใॲཧֶձ Πϯλʔωοτͱӡ༻ٕज़ݚڀձ ӡӦҕһ ηΩϡϦςΟɾΩϟϯϓશࠃେձߨࢣ id:y_uuki
  3. 7 ෼ࢄΞϓϦέʔγϣϯͷෳࡶԽͷഎܠ ௕ظؒͷαʔϏεఏ ڙதͷػೳ௥Ճ Ϣʔβʔ͔ΒͷΞΫ ηε૿Ճ ୯ҰͷαʔϏεࣄۀ ऀ͕ෳ਺ͷαʔϏε Λఏڙ (SNS,ECαΠτͳͲ)

    ༻్ಛԽͷϛυϧ΢ΣΞ௥Ճ ৽چγεςϜͷࠞ߹ εέʔϧΞ΢τʹΑΔϗετ਺ ͷ૿Ճ ෳ਺ͷαʔϏεͷҰ෦Λڞ༗ (Ϣʔβʔೝূج൫ͳͲ) Microservices
  4. 10 ՝୊Λղܾ͢ΔΞΠσΞ ໨త: ؅ཧऀʹͱͬͯະ஌ͷϓϩηεͱͷґଘΛ௥੻͢Δ ՝୊: 1.ґଘ௥੻ͷِӄੑ 2.஗ԆΦʔόʔϔου Linux OS Kernel

    Process Process TCP/UDP Flows … . . . User ιέοτΛ ؂ࢹ ղܾ: TCP/UDP઀ଓͷऴ ୺఺Ͱ͋ΔιέοτΛ؂ ࢹ͠ϑϩʔΛࣗಈ௥੻ ɾιέοτΛ࢖͍ͬͯ͑͞Ε ͹໢ཏతʹ௥੻Մೳ ɾιέοτͷ؂ࢹ͸ϓϩηε ͷ௨৴ͱ͸ಠཱ͍ͯ͠Δͨ Ίɼ஗ԆΦʔόϔου͸ͳ͍
  5. 14 ͜͜10೥΄Ͳͷ෼ࢄΞϓϦέʔγϣϯͷߏ੒ External DNS Server Application flow DNS flow RDB

    server Application server Web server Internal DNS server Full text search server KVS server Batch server ɾWeb3૚ߏ੒ʹՃ͑ͯɼNoSQLαʔόͳͲͷ௥Ճ
  6. 15 ୯ҰͷHost/Podͷߏ੒ Log collector agent Main network process Monitoring agent

    Proxy User Authentication DNS forwarder ɾϦόʔεϓϩΩγɼαΠυΧʔϓϩΩγ΁ͷ઀ଓ؅ཧͷҕৡ ɾϗετಉډܕͷϩάऩूΤʔδΣϯτɼϞχλϦϯάΤʔδΣϯτ
  7. 18 Observability (Մ؍ଌੑ) ࢀߟ: [Sridharan 17] Cindy Sridharan, Monitoring in

    the time of Cloud Native, Velocity, 2017. Low Observability Human Systems Monitoring Systems High Observability Logs Metrics Alerting Checking Investigating Human Systems Monitoring Systems Logs Metrics Alerting Checking Investigating Traces Observability Systems top, sar, iostat, tail …
  8. 22 ɾLayer7ͷ֤ϦΫΤετʹࣝผࢠΛׂΓৼΓɼޙଓͷϦΫΤετʹຒ ΊࠐΜ্ͩͰɼޙଓͷϓϩηε΁఻ൖͤ͞Δ ɾࣝผࢠΛཔΓʹɼϦΫΤετ͕γεςϜ಺ͷͲͷϓϩηεΛܦ༝͠ ͯॲཧ͞Ε͔ͨΛ௥੻ ɾར఺: ΞϓϦέʔγϣϯॲཧ಺༰΍L7ϓϩτίϧͷ৘ใΛ௥੻Մೳ ɾ՝୊: ஗ԆΦʔόϔουɼِӄੑɼܭଌ४උίετ (p.8ͱಉ༷)

    ϦΫΤετϕʔεΞϓϩʔν M. Y. Chen, et al., Pinpoint: Problem Determination in Large, Dynamic Internet Services, IEEE/IFIP International Conference on DSN, pp. 595–604 2002. P. Barham, et al., Magpie: Online Modelling and Performance-aware Systems, HotOS, pp. 85–90 2003. R. Fonseca, et al., X-Trace: A Pervasive Network Tracing Framework, USENIX Conference on NSDI, pp. 20–20 2007. B. H. Sigelman, et al., Dapper, a Large-Scale Distributed Systems Tracing Infrastructure, Technical report, Google 2010.
  9. 23 ɾݱࡏͷ෼ࢄτϨʔγϯάٕज़(OpenTelemetryͳͲ)ͷݪܕ ɾ௿ΦʔόʔϔουͱΞϓϦέʔγϣϯಁաੑ͕ಛ௃ Google Dapper [Sigelman 2010] [Sigelman 2010]: B.

    H. Sigelman, et al., Dapper, a Large-Scale Distributed Systems Tracing Infrastructure, Technical report, Google 2010. Figure 5. Dapper collection pipeline ɾదԠతαϯϓϦϯά ɾτϥϑΟοΫͷྲྀྔʹԠͯ͡Ϩʔτ มߋՄೳ ɾRPCܭଌϥΠϒϥϦ ɾεύϯͷ࡞੒ɼϩάΛॻ͖ग़͢ɼα ϯϓϦϯά͢ΔC++ͷϥΠϒϥϦ
  10. 24 ɾϒϥ΢βɼϞόΠϧΞϓϦɼόοΫΤ ϯυΛؚΉe2eτϨʔγϯά ɾ՝୊1. ec2ͷੑೳσʔλ͸࣮ߦϞσϧ ΍ཻ౓΍඼࣭͕ҟछࠞ߹ ɾ՝୊2. ๲େͳτϨʔεσʔλ ɾελοΫશମͷੑೳσʔλΛநग़͢Δ ͨΊͷύΠϓϥΠϯΛߏங

    ɾύΠϓϥΠϯͷ֤ஈ֊ͰΧελϚΠζ Facebook Canopy [Kaldor 2010] [Kaldor 2017]: J. Kaldor, et al., Canopy: An end-to-end performance tracing and analysis system, USENIX SOSP, 2017. Dapperͷ֦ு: ҟछࠞ߹σʔλͷ݁߹ΛՄೳ Figure 2.
  11. 26 ɾLayer3ͷύέοτΛ΋ͱʹґଘΛ௥੻͢Δ ɾطଘͷτϥώοΫ͔Βύ έοτΛऩू͠ɼύέοτϔομ্ͷૹ৴ ݩͱૹ৴ઌͷϗετͱϙʔτɼύέοτͷૹड৴ͷ࣌ࠁͳͲͷ৘ใ Λղੳ͢Δ ɾΫϥ΢υ্ͷ࣮؀ڥͰͷར༻ࣄྫ͸·ͩݟͨ͜ͱ͕ͳ͍ ɾৄࡉ͸লུ ύέοτϕʔεΞϓϩʔν P.

    Bahl, et.al.: Towards Highly Reliable Enterprise Network Services via Inference of Multi-Level Dependencies, ACM SIGCOMM Review, Vol. 37, No. 4, pp.13–24 2007. X. Chen, et.al.: Automating Network Application Dependency Discovery: Experiences, Limitations, and New Solutions, USENIX Symposium on OSDI, pp.117–130 2008. P. Lucian, etl.al.: Macroscope: End-Point Approach to Networked Application Dependency Discovery, CoNEXT, pp.229–240 2009. A. Natarajan, et.al.: NSDMiner: Automated Discovery of Network Service Dependencies, IEEE INFOCOM, pp. 2507–2515 2012. A. Zand, et.al.: Rippler: Delay Injection for Service Dependency Detection, IEEE INFOCOM, pp. 2157–2165 2014.
  12. 27 ɾܭଌ४උίετ ɾطଘͷίϯϙʔωϯτ಺ʹ௥ ੻ͷͨΊͷॲཧΛ௥Ճ͢Δखؒ ɾِӄੑ ɾखಈͰ௥੻ॲཧΛ௥Ճ͢Δͨ Ίɼ໢ཏੑʹ͚ܽΔ ɾܭଌ४උίετʹΑΓҰ෦ͷ ΈͷಋೖʹͳΓ͕ͪ ֤Ξϓϩʔνͷ՝୊੔ཧ

    ϦΫΤετϕʔεΞϓϩʔν ίωΫγϣϯϕʔεΞϓϩʔν ஗ԆΦʔόϔου: ௨৴ܦ࿏தʹ௥ՃͷॲཧΛڬΈࠐΉΦʔόϔου ɾِӄੑ ɾҰ୴ӬଓԽ͞Εͨ઀ଓΛ్ த͔Βݕग़Ͱ͖ͳ͍ ɾ௥੻୯Ґ͕ϓϩηεͰ͸ͳ͘ IPΞυϨε
  13. 31 TranstracerͷγεςϜߏ੒ Host 1 Host 2 Host N CMDB Tracer

    Tracer Tracer Systems Administrator ɾϗετ΍Pod্ʹTracerΤʔδΣϯτΛ ഑ஔ ɾ֤ΤʔδΣϯτ͸औಘͨ͠઀ଓ৘ใΛ CMDB(Connection Management DataBase)ʹอଘ ɾγεςϜ؅ཧऀ͸CMDBʹΞΫηε͠ɼ ෳ਺ͷϗετ΍Podʹ·͕ͨΓґଘؔ܎ Λऔಘ
  14. 32 TCPͷ઀ଓ৘ใͷऩू Host Kernel Process Process TCP/UDP Connections … Tracer

    Polling ɾTracerϓϩηε͕LinuxΧʔωϧʹ໰͍߹Θ ͤɼTCP/UDPιέοτ৘ใΛϙʔϦϯάऔಘ ɾ઀ଓΛऴ୺͢ΔOSϓϩηε৘ใ΋͋Θͤͯ औಘ ɾιέοτ৘ใ: /proc/net/tcp΍Netlink sock_diag ɾϓϩηε৘ใ: /proc/<pid>/{stat,fd} . . . ॲཧʹհೖ͠ͳ͍ͨΊ ௿Φʔόʔϔου
  15. 33 TCP઀ଓͷґଘͷํ޲ͷܾఆ Host Y Port N Process B CONNECT Host

    X Port M Process A LISTEN ɾ઀ଓΛཁٻ͢ΔϗετY͸ɼ઀ଓΛड͚෇͚ΔϗετXʹґଘ͢Δ ɾϗετY͔ΒΈͯѼઌϙʔτ͕LISTENϙʔτMͰ͋Ε͹ɼHost Y ͔Β઀ଓΛཁٻ͍ͯ͠Δ͜ͱ͕Θ͔Δ ɾLISTENϙʔτ͸ɼϗετXͷOSʹ໰͍߹Θͤͯऔಘ͢Δ
  16. 37 Transtracerͷར༻ྫ $ ttctl --dbhost 10.0.0.20 --ipv4 10.0.0.10 10.0.0.10:80 (’nginx’,

    pgid=4656) ᵋ<-- 10.0.0.11:many (’wrk’, pgid=5982) 10.0.0.10:80 (’nginx’, pgid=4656) ᵋ--> 10.0.0.12:8080 (’python’, pgid=6111) 10.0.0.10:many (’fluentd’, pgid=2127) ᵋ--> 10.0.0.13:24224 (’fluentd’, pgid=2001) 10.0.0.10 nginx 10.0.0.11 wrk 10.0.0.12 python 10.0.0.13 fluentd :80 fluentd :8080 :24224
  17. 38 ϓϩηεͷσʔλߏ଄ ᶃ LinuxͷϓϩηεάϧʔϓΛ ϊʔυͷ࠷খ୯Ґͱ͢Δ ᵓᴷnginx,627,627 ᴹ ᵓᴷnginx,628,627 ᴹ ᵋᴷnginx,629,627

    $ pstree -apg | grep nginx ᶄ (machine-id, ipv4, pgid, pname) ͰϓϩηεʹҰҙ੍໿Λ͔͚Δ ϓϩηε͸࠶ىಈ͢ΔͱID͕มԽ ͯ͠͠·͏ͨΊɼ໰͍߹Θͤ࣌ʹ ͸pgid͕ҟͳΔ΋ͷΛॏෳഉআ IPΞυϨεͷ࠶ར༻ʹରԠ͢Δͨ Ίʹmachine-idΛར༻ (machine-id͸ະ࣮૷)
  18. 39 ઀ଓ؅ཧͷͨΊͷσʔλߏ଄ ᶅ ActiveͱPassiveʹϊʔυΛ෼ྨ ᶆ Active => PassiveͷϑϩʔΛอଘ Active Passive

    Activeଆͷϙʔτ͸ू໿ࡁΈ ͳͷͰอ࣋͠ͳ͍ (p.33) PassiveଆͷΈϦοεϯϙʔτ Λอ࣋ Process Passive Active Port N Port M Active ಉҰϓϩηε͕Activeʹ΋ Passiveʹ΋ͳΓ͑Δ ಉҰϓϩηε͕ෳ਺ͷϙʔτ ΛϦοεϯ͢Δ͜ͱ͕͋Δ
  19. 40 CMDBͷςʔϒϧεΩʔϚ(ൈਮ) ςʔϒϧ໊ Ωʔ આ໌ processes process_id ipv4 pgid pname

    ϢχʔΫ੍໿ ϓϩηεΛࣝผ͢ΔओΩʔ ϓϩηε͕ಈ࡞͢Δϗετ্ͷIPΞυϨε LinuxͷϓϩηεάϧʔϓID ϓϩηε໊ (ipv4, pgid, pname) active_nodes node_id process_id ϢχʔΫ੍໿ ϊʔυΛࣝผ͢ΔओΩʔ processesςʔϒϧʹର͢Δ֎෦Ωʔ (process_id) passive_nodes node_id process_id port ϢχʔΫ੍໿ ϊʔυΛࣝผ͢ΔओΩʔ processesςʔϒϧʹର͢Δ֎෦Ωʔ Ϧοεϯϙʔτ൪߸ (process_id, port) flows flow_id active_node_id passive_node_id ϢχʔΫ੍໿ ϊʔυಉ࢜ͷ઀ଓΛࣝผ͢ΔओΩʔ active_nodesςʔϒϧ΁ͷ֎෦Ωʔ passive_nodesςʔϒϧ΁ͷ֎෦Ωʔ (active_node_id, passive_node_id)
  20. 43 ࣮ݧ؀ڥ ߲໨ ࢓༷ Client CPU Memory Benchmarker Intel Xeon

    CPU E5-2650 v3 2.30GHz 2core 1 GB wrk 4.1.0-4 Server CPU Memory HTTP Server Intel Xeon CPU E5-2650 v3 2.30GHz 4core 1GB nginx 1.17.3 CMDB CPU Memory Database Intel Xeon CPU E5-2650 v3 2.30GHz 1core 1 GB PostgreSQL 11.3 ɾΠϯελϯε͸͢΂ͯ͘͞ΒͷΫϥ΢υ্ʹߏங ɾLinux Kernel 4.15 (Ubuntu Server 18.04.3 LTS)
  21. 44 ɾNormal: ௥੻ॲཧ͕ͳ͍ঢ়ଶ ɾTranstracer: TranstracerʹΑΔ௥੻ॲཧ͕૸͍ͬͯΔঢ়ଶ ɾiptables NEWϑΟϧλํࣜ: ৽ن઀ଓͷΈϩάΛग़ྗ ɾ-I INPUT

    -m state --state NEW -m limit -j TRACE-LOG ɾiptables ESTBϑΟϧλํࣜ: ઀ଓཱ֬தʹ΍ΓͱΓ͞ΕΔύέοτ ͷϩάΛ͢΂ͯग़ྗ ɾ-I INPUT -m state --state ESTABLISHED -m limit -j TRACE-LOG ֤࣮૷ͷઃఆ
  22. 45 Ԡ౴஗ԆΦʔόʔϔου 50 100 150 200 250 300 350 400

    450 500 5000 10000 15000 20000 Average Latency (ms) Connections Normal 93.1 191.6 279.3 353.8 Transtracer 94.7 188.3 291.8 401.2 ESTB filter 115.0 236.0 359.0 462.5 NEW filter 113.1 214.4 310.0 449.3 ɾNormalʹରͯ͠transtracer͕ 1.7~13.4%ͷΦʔόϔου૿Ճ ɾiptables࣮૷ͷESTBϑΟϧλํࣜ ʹରͯ͠ɼtranstracer͕13-20% ͷΦʔόϔουݮগ
  23. 46 CPUར༻཰Φʔόʔϔου 0 10 20 30 40 50 60 70

    80 90 100 5000 10000 15000 20000 0 50 100 150 200 250 300 350 400 450 500 CPU usage (%) Reading sockets time(ms) Connections ttracerd’s CPU usage 13.2 23.0 34.2 44.4 ESTB filter’s CPU usage 72.2 75.9 78.8 78.6 Reading sockets time 102.3 199.1 317.8 408.6 ɾ20,000઀ଓʹ͓͍ͯɼTranstracer ͷCPUར༻཰44.4%ɼESTBϑΟϧ λํࣜͷCPUར༻཰͸78.6% ɾ43.5%ͷCPUར༻཰ͷ௿ݮ ɾNEWϑΟϧλํࣜ͸઀ଓཱ֬࣌ͷ ΈCPUΛར༻͢ΔͨΊ༗ར