Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Transtracer: 分散システムにおけるTCP/UDP通信の終端点の監視によるプロセス間依存関係の自動追跡

Transtracer: 分散システムにおけるTCP/UDP通信の終端点の監視によるプロセス間依存関係の自動追跡

情報処理学会 第12回インターネットと運用技術シンポジウム(IOTS)2019
論文: https://yuuk.io/papers/transtracer_iots2019.pdf
OSS: https://github.com/yuuki/transtracer

Yuuki Tsubouchi (yuuk1)

December 06, 2019
Tweet

More Decks by Yuuki Tsubouchi (yuuk1)

Other Decks in Research

Transcript

  1. ͘͞ΒΠϯλʔωοτ גࣜձࣾ (C) Copyright 1996-2019 SAKURA internet Inc ͘͞ΒΠϯλʔωοτ ݚڀॴ

    Transtracer: ෼ࢄγεςϜʹ͓͚ΔTCP/UDP௨৴ͷ ऴ୺఺ͷ؂ࢹʹΑΔϓϩηεؒґଘؔ܎ͷࣗಈ௥੻ ৘ใॲཧֶձ ୈ12ճΠϯλʔωοτͱӡ༻ٕज़γϯϙδ΢Ϝ(IOTS)2019 2019.12.6 ௶಺༎थ*1, ݹ઒խେ*2, দຊ྄հ*1 *1) ͘͞ΒΠϯλʔωοτ, *2) ͸ͯͳ
  2. 4 WebαʔϏε಺ͷґଘؔ܎ͷෳࡶԽͷഎܠ ௕ظؒͷαʔϏεఏڙத ͷػೳ௥Ճ Ϣʔβʔ͔ΒͷΞΫηε ૿Ճ ୯ҰͷαʔϏεࣄۀऀ͕ ෳ਺ͷαʔϏεΛఏڙ (SNS,ECαΠτͳͲ) ༻్ಛԽͷϛυϧ΢ΣΞͷ௥Ճ

    ৽چγεςϜͷࠞ߹ঢ়ଶ εέʔϧΞ΢τʹΑΔϗετ਺ ͷ૿Ճ ෳ਺ͷαʔϏεͷҰ෦Λڞ༗ (Ϣʔβʔೝূج൫ͳͲ) ϚΠΫϩαʔϏεԽ
  3. 7 ݚڀ໨త: ະ஌ͷϓϩηεΛ௿ෛՙͰ௥੻ Linux OS Kernel Process Process TCP/UDP Flows

    … . . . User ιέοτ Λ؂ࢹ TCP/UDP઀ଓͷऴ୺఺Ͱ͋ΔιέοτΛ؂ࢹ͠઀ଓΛࣗಈ௥੻ 1. ஗ԆΦʔόϔουͷղܾ ɾιέοτͷ؂ࢹΛϓϩηε ͷ௨৴ͱಠཱͤ͞Δ 2. ِӄੑͷղܾ ɾ͢΂ͯͷιέοτΛ؂ࢹ ɾύέοτ͸ඵؒ਺ສ୯Ґ͕ ͋Γ͑Δ ɾιέοτ͸࠶ར༻͞Ε͏Δ ͨΊɼιέοτ਺ͷ΄͏͕ খ͘͞ͳΓ΍͍͢
  4. 12 ɾΞϓϦέʔγϣϯ૚ͷ֤ϦΫΤετʹࣝผࢠΛׂΓৼΓɼޙଓͷϦ ΫΤετʹຒΊࠐΜ্ͩͰɼޙଓͷϓϩηε΁఻ൖͤ͞Δ ɾࣝผࢠΛཔΓʹɼϦΫΤετ͕γεςϜ಺ͷͲͷϓϩηεΛܦ༝͠ ͯॲཧ͞Ε͔ͨΛ௥੻ ϦΫΤετϕʔεΞϓϩʔν M. Y. Chen, et

    al., Pinpoint: Problem Determination in Large, Dynamic Internet Services, IEEE/IFIP International Conference on DSN, pp. 595–604 2002. P. Barham, et al., Magpie: Online Modelling and Performance-aware Systems, HotOS, pp. 85–90 2003. R. Fonseca, et al., X-Trace: A Pervasive Network Tracing Framework, USENIX Conference on NSDI, pp. 20–20 2007. B. H. Sigelman, et al., Dapper, a Large-Scale Distributed Systems Tracing Infrastructure, Technical report, Google 2010.
  5. 14 ɾωοτϫʔΫ૚ͷύέοτΛ΋ͱʹґଘΛ௥੻͢Δ ɾطଘͷτϥώοΫ͔ΒύέοτΛऩू͠ɼύέοτϔομ্ͷIPΞ υϨεͱϙʔτɼύέοτͷૹड৴ͷ࣌ࠁͳͲͷ৘ใΛղੳ͢Δ ɾ֤ܦ࿏ͰτϥώοΫྲྀྔʹ͕࣌ؒࠩ͋Δ͜ͱʹண໨͠ɼ֤ܦ࿏Ͱτ ϥώοΫੑ࣭ʹ૬͕ؔ͋ΔύλʔϯΛൃݟ͠ɼґଘΛਪఆ͢Δ ύέοτϕʔεΞϓϩʔν P. Bahl, et.al.:

    Towards Highly Reliable Enterprise Network Services via Inference of Multi-Level Dependencies, ACM SIGCOMM Review, Vol. 37, No. 4, pp.13–24 2007. X. Chen, et.al.: Automating Network Application Dependency Discovery: Experiences, Limitations, and New Solutions, USENIX Symposium on OSDI, pp.117–130 2008. P. Lucian, etl.al.: Macroscope: End-Point Approach to Networked Application Dependency Discovery, CoNEXT, pp.229–240 2009. A. Natarajan, et.al.: NSDMiner: Automated Discovery of Network Service Dependencies, IEEE INFOCOM, pp. 2507–2515 2012. A. Zand, et.al.: Rippler: Delay Injection for Service Dependency Detection, IEEE INFOCOM, pp. 2157–2165 2014.
  6. 17 ఏҊख๏: ιέοτϕʔεΞϓϩʔν Linux OS Kernel Process Process TCP/UDP Flows

    … . . . User ιέοτ Λ؂ࢹ ɾιέοτͷ؂ࢹ͸ϓϩηεͷ ௨৴ͱ͸ಠཱ͍ͯ͠ΔͨΊɼ ஗ԆΦʔόϔουΛ௿ݮՄೳ ɾ઀ଓ࠶ར༻࣌ʹ͸͢΂ͯͷι έοτΛ௿ෛՙͰ؂ࢹՄೳ ɾιέοτʹΑΓɼϓϩηεͱ ઀ଓͷඥ෇͚͕Մೳ ɾϓϩηε͕ιέοτΛར༻͢ ΔݶΓಁաతʹ௥੻Մೳ
  7. 18 TranstracerͷγεςϜߏ੒ Host 1 Host 2 Host N CMDB Tracer

    Tracer Tracer Systems Administrator ɾϗετ্ʹTracerϓϩηεΛ഑ஔ ɾ֤Tracerϓϩηε͸औಘͨ͠઀ଓ৘ ใΛCMDB(઀ଓ৘ใ؅ཧDB)ʹอଘ ɾγεςϜ؅ཧऀ͸CMDBʹΞΫηε ͠ɼෳ਺ͷϗετʹ·͕ͨΓґଘؔ ܎Λऔಘ
  8. 19 ઀ଓ৘ใͷऔಘ Host Kernel Process Process TCP/UDP Flows … Tracer

    Polling ɾTracerϓϩηε͕LinuxΧʔωϧʹ໰͍߹Θ ͤɼTCP/UDPιέοτ৘ใΛϙʔϦϯάऔಘ ɾ઀ଓΛऴ୺͢ΔOSϓϩηε৘ใ΋͋Θͤͯ औಘ ɾιέοτ৘ใ: /proc/net/tcp΍Netlink sock_diag ɾϓϩηε৘ใ: /proc/<pid>/{stat,fd} . . . ॲཧʹհೖ͠ͳ͍ͨΊ ௿Φʔόʔϔου
  9. 20 Transtracerͷར༻ྫ $ ttctl --dbhost 10.0.0.20 --ipv4 10.0.0.10 10.0.0.10:80 (’nginx’,

    pgid=4656) ᵋ<-- 10.0.0.11:many (’wrk’, pgid=5982) 10.0.0.10:80 (’nginx’, pgid=4656) ᵋ--> 10.0.0.12:8080 (’python’, pgid=6111) 10.0.0.10:many (’fluentd’, pgid=2127) ᵋ--> 10.0.0.13:24224 (’fluentd’, pgid=2001) 10.0.0.10 nginx 10.0.0.11 wrk 10.0.0.12 python 10.0.0.13 fluentd :80 fluentd :8080 :24224
  10. 21 ϓϩηεͷσʔλߏ଄ ᶃ LinuxͷϓϩηεάϧʔϓΛ ϊʔυͷ࠷খ୯Ґͱ͢Δ ᵓᴷnginx,627,627 ᴹ ᵓᴷnginx,628,627 ᴹ ᵋᴷnginx,629,627

    $ pstree -apg | grep nginx ᶄ (ipv4, pgid, pname)Ͱϓϩηε ʹҰҙ੍໿Λ͔͚Δ ϓϩηε͸࠶ىಈ͢ΔͱID͕มԽ ͯ͠͠·͏ͨΊɼ໰͍߹Θͤ࣌ʹ ͸pgid͕ҟͳΔ΋ͷΛॏෳഉআ
  11. 22 ઀ଓ؅ཧͷͨΊͷσʔλߏ଄ ᶅ ActiveͱPassiveʹϊʔυΛ෼ྨ ᶆ Active => PassiveͷϑϩʔΛอଘ Active Passive

    Process Passive Active Port N Port M Active ಉҰϓϩηε͕Activeʹ΋ Passiveʹ΋ͳΓ͑Δ ಉҰϓϩηε͕ෳ਺ͷϙʔτ ΛϦοεϯ͢Δ͜ͱ͕͋Δ
  12. 27 ࣮ݧ؀ڥͷৄࡉ ߲໨ ࢓༷ Client CPU Memory Benchmarker Intel Xeon

    CPU E5-2650 v3 2.30GHz 2core 1 GB wrk 4.1.0-4 Server CPU Memory HTTP Server Intel Xeon CPU E5-2650 v3 2.30GHz 4core 1GB nginx 1.17.3 CMDB CPU Memory Database Intel Xeon CPU E5-2650 v3 2.30GHz 1core 1 GB PostgreSQL 11.3 ɾΠϯελϯε͸͢΂ͯ͘͞ΒͷΫϥ΢υ্ʹߏங ɾLinux Kernel 4.15 (Ubuntu Server 18.04.3 LTS)
  13. 28 1. Normal: ௥੻ॲཧ͕ͳ͍ঢ়ଶ 2. Transtracer: ఏҊख๏ (https://github.com/yuuki/transtracer v0.1.0) ɾϙʔϦϯάִؒ͸1ඵ

    3. iptables NEWϑΟϧλํࣜ: ৽ن઀ଓͷΈϩάΛग़ྗ 4. iptables ESTBϑΟϧλํࣜ: ઀ଓཱ֬தʹ΍ΓͱΓ͞ΕΔύέοτ ͷϩάΛαϯϓϦϯάͤͣʹग़ྗ ɾઌߦख๏ͷແ࡞ҝͳαϯϓϦϯάͰ͸ɼ௕໋ͳ઀ଓͰ͋ͬͯ΋ྲྀ ྔ͕খ͍͞઀ଓΛݟಀ͢Մೳੑ͕͋Δ ࣮ݧʹ༻͍Δ֤࣮૷
  14. 29 Ԡ౴஗ԆΦʔόʔϔου 50 100 150 200 250 300 350 400

    450 500 5000 10000 15000 20000 Average Latency (ms) Connections Normal 93.1 191.6 279.3 353.8 Transtracer 94.7 188.3 291.8 401.2 ESTB filter 115.0 236.0 359.0 462.5 NEW filter 113.1 214.4 310.0 449.3 ɾNormalʹରͯ͠transtracer ͕1.7~13.4%ͷΦʔόϔο υ૿ ɾiptables࣮૷ͷESTBϑΟϧ λํࣜʹରͯ͠ɼ transtracer͕13-20%ͷ Φʔόϔουݮগ
  15. 30 CPUར༻཰Φʔόʔϔου 0 10 20 30 40 50 60 70

    80 90 100 5000 10000 15000 20000 0 50 100 150 200 250 300 350 400 450 500 CPU usage (%) Reading sockets time(ms) Connections ttracerd’s CPU usage 13.2 23.0 34.2 44.4 ESTB filter’s CPU usage 72.2 75.9 78.8 78.6 Reading sockets time 102.3 199.1 317.8 408.6 ɾ20,000઀ଓʹ͓͍ͯɼ TranstracerͷCPUར༻཰ 44.4%ɼESTBϑΟϧλํ ࣜͷCPUར༻཰͸78.6% ɾ43.5%ͷCPUར༻཰ͷ௿ ݮ