$30 off During Our Annual Pro Sale. View Details »

分散システム内のプロセス間の関係性に着目したObservabilityツールの設計と実装 / Transtracer CNDK2019

分散システム内のプロセス間の関係性に着目したObservabilityツールの設計と実装 / Transtracer CNDK2019

CloudNative Days KANSAI 2019

Yuuki Tsubouchi (yuuk1)

November 28, 2019
Tweet

More Decks by Yuuki Tsubouchi (yuuk1)

Other Decks in Research

Transcript

  1. ͘͞ΒΠϯλʔωοτ
    גࣜձࣾ
    (C) Copyright 1996-2019 SAKURA internet Inc
    ͘͞ΒΠϯλʔωοτ
    ݚڀॴ
    ෼ࢄγεςϜ಺ͷϓϩηεؒͷؔ܎ੑʹ
    ண໨ͨ͠Observabilityπʔϧͷઃܭͱ࣮૷
    ͘͞ΒΠϯλʔωοτݚڀॴ
    Yuuki Tsubouchi / @yuuk1t
    CloudNative Days Kansai 2019
    2019.11.28

    View Slide

  2. 2
    ࣗݾ঺հ
    Yuuki Tsubouchi / Ώ͏͏͖
    https://yuuk.io/
    ܦྺ
    גࣜձࣾ͸ͯͳ
    WebΦϖϨʔγϣϯΤϯδχΞɾSRE
    ͘͞ΒΠϯλʔωοτגࣜձࣾ
    ͘͞ΒΠϯλʔωοτݚڀॴ ݚڀһ
    WebαʔϏεͷ
    ։ൃɾӡ༻
    Πϯλʔωοτ
    ج൫ٕज़ݚڀ
    5೥
    ݱࡏ
    Site Reliability Engineering(SRE) Researcher
    @yuuk1t
    ৘ใॲཧֶձ Πϯλʔωοτͱӡ༻ٕज़ݚڀձ ӡӦҕһ
    ηΩϡϦςΟɾΩϟϯϓશࠃେձߨࢣ
    id:y_uuki

    View Slide

  3. 3
    ɾࠃ಺Ͱ͸طଘͷCloudNativeͳιϑτ΢ΣΞΛ͍͔ʹ͏·͘ʮ࢖
    ͏ʯ͔ͱ͍͏࿩୊͕ࢧ഑త
    ɾCloudNativeͳιϑτ΢ΣΞΛʮ࡞Δʯ࿩΋͍͖͍ͯͨ͠
    ɾݱ࣌఺Ͱ͸·ͩProof of Concept (PoC)Ͱ͋Γɼ໨ͷલͷ՝୊ʹର
    ͙ͯ͢͠ʹ໾ʹཱͭͱ͍͏Θ͚Ͱ͸ͳ͍
    ɾͦΕͰ΋ɼʮ࡞Δʯͱ͍͏ಈػ෇͚Λޙԡ͍ͨ͠͠
    ຊߨԋͷϞνϕʔγϣϯ

    View Slide

  4. https://yuuk.io/papers/transtracer_iots2019.pdf
    Transtracer: ෼ࢄγεςϜʹ͓͚Δ
    TCP/UDP௨৴ͷऴ୺఺ͷ؂ࢹʹΑΔ
    ϓϩηεؒґଘؔ܎ͷࣗಈ௥੻

    View Slide

  5. 5
    1. ͸͡Ίʹ
    2. ObservabilityͱTracingٕज़
    3. Transtracerͷઃܭͱ࣮૷
    4. ࣮ݧ
    5. ࠓޙͷల๬
    6. ·ͱΊ
    ໨࣍

    View Slide

  6. 1.
    ͸͡Ίʹ

    View Slide

  7. 7
    ෼ࢄΞϓϦέʔγϣϯͷෳࡶԽͷഎܠ
    ௕ظؒͷαʔϏεఏ
    ڙதͷػೳ௥Ճ
    Ϣʔβʔ͔ΒͷΞΫ
    ηε૿Ճ
    ୯ҰͷαʔϏεࣄۀ
    ऀ͕ෳ਺ͷαʔϏε
    Λఏڙ
    (SNS,ECαΠτͳͲ)
    ༻్ಛԽͷϛυϧ΢ΣΞ௥Ճ
    ৽چγεςϜͷࠞ߹
    εέʔϧΞ΢τʹΑΔϗετ਺
    ͷ૿Ճ
    ෳ਺ͷαʔϏεͷҰ෦Λڞ༗
    (Ϣʔβʔೝূج൫ͳͲ)
    Microservices

    View Slide

  8. 8
    ɾ͋ΔίϯϙʔωϯτʹมߋΛՃ͑Δͱ͖ʹɼมߋͷӨڹൣғ͕ෆ໌
    ɾௐࠪʹ࣌ؒΛཁ͢Δ͔ɼ΋͘͠͸มߋΛఘΊΔ
    ɾো֐ൃੜ࣌ʹɼ֤ॴͷҟৗͷ૬ؔؔ܎΍ҼՌؔ܎͕Θ͔Βͳ͍
    ɾ֤छϝτϦοΫ΍ϩάΛ؅ཧऀ͕खಈͰಥ͖߹ΘͤͯݪҼΛ୳Δ
    ɾґଘؔ܎͸γεςϜͷՔಇதʹมߋ͞Ε͍ͯͨ͘ΊɼखಈͰυΩϡ
    ϝϯτΛߋ৽͠ଓ͚Δख͕ؒେ͖͍
    ෳࡶԽʹΑΓੜ·ΕΔ໰୊ҙࣝ
    γεςϜ؅ཧऀʹͱͬͯະ஌ͷϓϩηεͷґଘؔ܎Λ
    ࣗಈͰ௥੻͢Δඞཁ͕͋Δ

    View Slide

  9. 9
    ɾL7૚ͷϦΫΤετ͕γεςϜ಺ͷͲͷܦ࿏ΛͨͲΔ͔Λ௥੻͢Δ
    ɾ௨৴ܦ࿏্ͰϦΫΤετʹࣝผࢠΛຒΊࠐΉॲཧͱτϨʔεσʔλ
    Λه࿥͢Δॲཧ͕ඞཁ
    ɾ஗ԆΦʔόϔου: ࣝผࢠຒΊࠐΈॲཧͷͨΊͷΦʔόϔου
    ɾِӄੑ: ࣄલʹґଘΛ೺Ѳ্ͨ͠ͰࣝผࢠΛຒΊࠐΉͨΊɼະ஌ͷ
    ϓϩηεΛൃݟ͢Δ໨తʹ͸ద͞ͳ͍
    ɾܭଌ४උίετ: طଘͷίϯϙʔωϯτʹ௥੻ͷͨΊͷॲཧΛ௥Ճ
    ͢Δख͕ؒ͋Δ
    طଘͷࣗಈ௥੻ٕज़ - ෼ࢄτϨʔγϯά

    View Slide

  10. 10
    ՝୊Λղܾ͢ΔΞΠσΞ
    ໨త: ؅ཧऀʹͱͬͯະ஌ͷϓϩηεͱͷґଘΛ௥੻͢Δ
    ՝୊: 1.ґଘ௥੻ͷِӄੑ 2.஗ԆΦʔόʔϔου
    Linux OS
    Kernel
    Process Process
    TCP/UDP Flows

    .
    .
    .
    User
    ιέοτΛ
    ؂ࢹ
    ղܾ: TCP/UDP઀ଓͷऴ
    ୺఺Ͱ͋ΔιέοτΛ؂
    ࢹ͠ϑϩʔΛࣗಈ௥੻
    ɾιέοτΛ࢖͍ͬͯ͑͞Ε
    ͹໢ཏతʹ௥੻Մೳ
    ɾιέοτͷ؂ࢹ͸ϓϩηε
    ͷ௨৴ͱ͸ಠཱ͍ͯ͠Δͨ
    Ίɼ஗ԆΦʔόϔου͸ͳ͍

    View Slide

  11. 11
    ϗετ/PodΛى఺ͱͨ͠ґଘؔ܎औಘ
    ɾมߋର৅ͷϊʔυΛى఺ͱͯ͠ɼґଘؔ܎Λऔಘ͢Δ
    ɾ͜Ε͔Βߦ͏มߋʹରͯ͠Өڹ͢ΔൣғΛಛఆͰ͖Δ
    Target
    ɾιϑτ΢ΣΞ͕ࣗಈͰγεςϜ
    Λมߋ͢Δ৔߹
    ɾSREͰ͍͏ͱ͜ΖͷΤϥʔό
    δΣοτͷ࢒ྔ͕গͳ͍ίϯ
    ϙʔωϯτؚ͕·ΕΔͱ؅ཧऀ
    ʹҰ୴௨஌

    View Slide

  12. 2.
    ObservabilityͱTracingٕज़

    View Slide

  13. ෼ࢄΞϓϦέʔγϣϯͷෳࡶԽ

    View Slide

  14. 14
    ͜͜10೥΄Ͳͷ෼ࢄΞϓϦέʔγϣϯͷߏ੒
    External
    DNS Server
    Application flow
    DNS flow
    RDB
    server
    Application
    server
    Web
    server
    Internal
    DNS server
    Full text
    search server
    KVS server
    Batch
    server
    ɾWeb3૚ߏ੒ʹՃ͑ͯɼNoSQLαʔόͳͲͷ௥Ճ

    View Slide

  15. 15
    ୯ҰͷHost/Podͷߏ੒
    Log collector
    agent
    Main
    network process
    Monitoring
    agent
    Proxy
    User
    Authentication
    DNS
    forwarder
    ɾϦόʔεϓϩΩγɼαΠυΧʔϓϩΩγ΁ͷ઀ଓ؅ཧͷҕৡ
    ɾϗετಉډܕͷϩάऩूΤʔδΣϯτɼϞχλϦϯάΤʔδΣϯτ

    View Slide

  16. 16
    ෳ਺ͷγεςϜ͕݁߹͢Δߏ੒ͷීٴ
    Message Queue
    Reverse Proxy
    ɾϦόʔεϓϩΩγɼαΠυΧʔϓϩΩγ΁ͷ઀ଓ؅ཧͷҕৡ
    ɾϗετಉډܕͷϩάऩूΤʔδΣϯτɼϞχλϦϯάΤʔδΣϯτ

    View Slide

  17. Observability

    View Slide

  18. 18
    Observability (Մ؍ଌੑ)
    ࢀߟ: [Sridharan 17] Cindy Sridharan, Monitoring in the time of Cloud Native, Velocity, 2017.
    Low Observability
    Human
    Systems
    Monitoring
    Systems
    High Observability
    Logs Metrics
    Alerting
    Checking
    Investigating
    Human
    Systems
    Monitoring
    Systems
    Logs Metrics
    Alerting
    Checking
    Investigating
    Traces
    Observability
    Systems
    top, sar, iostat, tail …

    View Slide

  19. 19
    SREͱ͸৴པੑΛ੍ޚ͠ɼมߋ଎౓Λ޲্ͤ͞Δٕज़ମܥ
    SREͷจ຺͔ΒΈΔObservability
    Reliability
    100%
    99%
    ੍ޚ෯
    ҰൠʹγεςϜมߋ࣌ʹ৴པੑΛ௿Լͤ͞ΔϦεΫ͕͋Δ
    ͍͔ʹϦεΫΛখ੍ͯ͘͞͠ޚ෯Λখ͘͞Ͱ͖Δ͔͕伴
    ObservabilityΛߴΊΔ͜ͱʹΑΓ
    ɾϦεΫͷن໛Λ༧ଌͰ͖Δ (Transtracerͷ໨త)
    ɾൃੜͨ͠ϦεΫΛ࠷খݶʹཹΊΒΕΔ

    View Slide

  20. Tracingٕज़ͷ෼ྨ

    View Slide

  21. 21
    Tracingٕज़ͷ෼ྨ
    ϦΫΤετϕʔε
    Ξϓϩʔν
    ΞϓϦέʔγϣϯ૚ͷϦΫΤετ͕
    Ͳͷ௨৴ܦ࿏ΛͨͲΔ͔Λ௥੻͢Δख๏
    ίωΫγϣϯϕʔε
    Ξϓϩʔν
    ऴ୺ϗετ্Ͱτϥϯεϙʔτ઀ଓͷঢ়ଶΛ
    औಘ͢Δ͜ͱʹΑΓґଘؔ܎Λ௥੻͢Δख๏
    ύέοτϕʔε
    Ξϓϩʔν
    ϗετ্΍εΠον্ͰύέοτϔομΛ؍ଌ
    ͢Δ͜ͱʹΑΓɼ ґଘؔ܎Λ௥੻͢Δख๏

    View Slide

  22. 22
    ɾLayer7ͷ֤ϦΫΤετʹࣝผࢠΛׂΓৼΓɼޙଓͷϦΫΤετʹຒ
    ΊࠐΜ্ͩͰɼޙଓͷϓϩηε΁఻ൖͤ͞Δ
    ɾࣝผࢠΛཔΓʹɼϦΫΤετ͕γεςϜ಺ͷͲͷϓϩηεΛܦ༝͠
    ͯॲཧ͞Ε͔ͨΛ௥੻
    ɾར఺: ΞϓϦέʔγϣϯॲཧ಺༰΍L7ϓϩτίϧͷ৘ใΛ௥੻Մೳ
    ɾ՝୊: ஗ԆΦʔόϔουɼِӄੑɼܭଌ४උίετ (p.8ͱಉ༷)
    ϦΫΤετϕʔεΞϓϩʔν
    M. Y. Chen, et al., Pinpoint: Problem Determination in Large, Dynamic Internet Services, IEEE/IFIP International Conference on DSN, pp. 595–604 2002.
    P. Barham, et al., Magpie: Online Modelling and Performance-aware Systems, HotOS, pp. 85–90 2003.
    R. Fonseca, et al., X-Trace: A Pervasive Network Tracing Framework, USENIX Conference on NSDI, pp. 20–20 2007.
    B. H. Sigelman, et al., Dapper, a Large-Scale Distributed Systems Tracing Infrastructure, Technical report, Google 2010.

    View Slide

  23. 23
    ɾݱࡏͷ෼ࢄτϨʔγϯάٕज़(OpenTelemetryͳͲ)ͷݪܕ
    ɾ௿ΦʔόʔϔουͱΞϓϦέʔγϣϯಁաੑ͕ಛ௃
    Google Dapper [Sigelman 2010]
    [Sigelman 2010]: B. H. Sigelman, et al., Dapper, a Large-Scale Distributed Systems Tracing Infrastructure, Technical report, Google 2010.
    Figure 5. Dapper collection pipeline
    ɾదԠతαϯϓϦϯά
    ɾτϥϑΟοΫͷྲྀྔʹԠͯ͡Ϩʔτ
    มߋՄೳ
    ɾRPCܭଌϥΠϒϥϦ
    ɾεύϯͷ࡞੒ɼϩάΛॻ͖ग़͢ɼα
    ϯϓϦϯά͢ΔC++ͷϥΠϒϥϦ

    View Slide

  24. 24
    ɾϒϥ΢βɼϞόΠϧΞϓϦɼόοΫΤ
    ϯυΛؚΉe2eτϨʔγϯά
    ɾ՝୊1. ec2ͷੑೳσʔλ͸࣮ߦϞσϧ
    ΍ཻ౓΍඼࣭͕ҟछࠞ߹
    ɾ՝୊2. ๲େͳτϨʔεσʔλ
    ɾελοΫશମͷੑೳσʔλΛநग़͢Δ
    ͨΊͷύΠϓϥΠϯΛߏங
    ɾύΠϓϥΠϯͷ֤ஈ֊ͰΧελϚΠζ
    Facebook Canopy [Kaldor 2010]
    [Kaldor 2017]: J. Kaldor, et al., Canopy: An end-to-end performance tracing and analysis system, USENIX SOSP, 2017.
    Dapperͷ֦ு: ҟछࠞ߹σʔλͷ݁߹ΛՄೳ
    Figure 2.

    View Slide

  25. 25
    ɾLinuxͷύέοτϑΟϧλ(iptables)Λར༻ͯ͠ɼLayer4઀ଓΛݕग़
    ɾiptablesͰ઀ଓ։࢝ϩάΛऩू͠ɼϩάίϨΫλͰதԝͷσʔλ
    ϕʔεʹอଘ
    ɾLinuxΧʔωϧͷL4௨৴ػߏΛར༻͢ΔݶΓɼґଘΛ௥੻Մೳ
    ɾ՝୊1: LinuxΧʔωϧ಺ͷύέοτ୯Ґͷॲཧʹհೖ͢ΔͨΊɼϓ
    ϩηεͷ௨৴ʹ஗ԆΦʔόϔουΛ༩͑Δ
    ɾ՝୊2: Ұ୴ӬଓԽ͞Εͨ઀ଓΛ్த͔Βݕग़Ͱ͖ͳ͍
    ɾ՝୊3: ϗετ୯ҐͰͷ௥੻͸Ͱ͖Δ͕ϓϩηε୯ҐͰ͸Ͱ͖ͳ͍
    ίωΫγϣϯϕʔεΞϓϩʔν [Clawson 15]
    [Clawson 15] J. K. Clawson, Service Dependency Analysis via TCP/UDP Port Tracing, Master’s thesis, Brigham Young University-Provo 2015.

    View Slide

  26. 26
    ɾLayer3ͷύέοτΛ΋ͱʹґଘΛ௥੻͢Δ
    ɾطଘͷτϥώοΫ͔Βύ έοτΛऩू͠ɼύέοτϔομ্ͷૹ৴
    ݩͱૹ৴ઌͷϗετͱϙʔτɼύέοτͷૹड৴ͷ࣌ࠁͳͲͷ৘ใ
    Λղੳ͢Δ
    ɾΫϥ΢υ্ͷ࣮؀ڥͰͷར༻ࣄྫ͸·ͩݟͨ͜ͱ͕ͳ͍
    ɾৄࡉ͸লུ
    ύέοτϕʔεΞϓϩʔν
    P. Bahl, et.al.: Towards Highly Reliable Enterprise Network Services via Inference of Multi-Level Dependencies, ACM SIGCOMM Review, Vol. 37, No. 4,
    pp.13–24 2007.
    X. Chen, et.al.: Automating Network Application Dependency Discovery: Experiences, Limitations, and New Solutions, USENIX Symposium on OSDI,
    pp.117–130 2008.
    P. Lucian, etl.al.: Macroscope: End-Point Approach to Networked Application Dependency Discovery, CoNEXT, pp.229–240 2009.
    A. Natarajan, et.al.: NSDMiner: Automated Discovery of Network Service Dependencies, IEEE INFOCOM, pp. 2507–2515 2012.
    A. Zand, et.al.: Rippler: Delay Injection for Service Dependency Detection, IEEE INFOCOM, pp. 2157–2165 2014.

    View Slide

  27. 27
    ɾܭଌ४උίετ
    ɾطଘͷίϯϙʔωϯτ಺ʹ௥
    ੻ͷͨΊͷॲཧΛ௥Ճ͢Δखؒ
    ɾِӄੑ
    ɾखಈͰ௥੻ॲཧΛ௥Ճ͢Δͨ
    Ίɼ໢ཏੑʹ͚ܽΔ
    ɾܭଌ४උίετʹΑΓҰ෦ͷ
    ΈͷಋೖʹͳΓ͕ͪ
    ֤Ξϓϩʔνͷ՝୊੔ཧ
    ϦΫΤετϕʔεΞϓϩʔν ίωΫγϣϯϕʔεΞϓϩʔν
    ஗ԆΦʔόϔου: ௨৴ܦ࿏தʹ௥ՃͷॲཧΛڬΈࠐΉΦʔόϔου
    ɾِӄੑ
    ɾҰ୴ӬଓԽ͞Εͨ઀ଓΛ్
    த͔Βݕग़Ͱ͖ͳ͍
    ɾ௥੻୯Ґ͕ϓϩηεͰ͸ͳ͘
    IPΞυϨε

    View Slide

  28. 3.
    Transtracerͷઃܭͱ࣮૷

    View Slide

  29. ઃܭ

    View Slide

  30. 30
    1. ܭଌ४උίετͷ௿ݮ
    ɾίωΫγϣϯϕʔεΞϓϩʔνͷ࠾༻
    2. ϓϩηε୯ҐͰͷґଘؔ܎௥੻
    ɾ઀ଓऴ୺఺Ͱ͋Διέοτ͔Β઀ଓ৘ใΛऔಘ
    3. ઀ଓͷӬଓԽ؀ڥʹ͓͚Δِӄੑͷ௿ݮ
    ɾιέοτ͕อ࣋͢Δ઀ଓ৘ใΛϙʔϦϯά؂ࢹ
    4. ௿஗ԆΦʔόʔϔου
    ɾιέοτ؂ࢹ͸ϓϩηεॲཧͱ͸ಠཱ͍ͯ͠Δ
    Transtracerͷཁ݅ͱղܾ

    View Slide

  31. 31
    TranstracerͷγεςϜߏ੒
    Host 1
    Host 2
    Host N
    CMDB
    Tracer
    Tracer
    Tracer
    Systems
    Administrator
    ɾϗετ΍Pod্ʹTracerΤʔδΣϯτΛ
    ഑ஔ
    ɾ֤ΤʔδΣϯτ͸औಘͨ͠઀ଓ৘ใΛ
    CMDB(Connection Management
    DataBase)ʹอଘ
    ɾγεςϜ؅ཧऀ͸CMDBʹΞΫηε͠ɼ
    ෳ਺ͷϗετ΍Podʹ·͕ͨΓґଘؔ܎
    Λऔಘ

    View Slide

  32. 32
    TCPͷ઀ଓ৘ใͷऩू
    Host
    Kernel
    Process Process
    TCP/UDP
    Connections

    Tracer
    Polling
    ɾTracerϓϩηε͕LinuxΧʔωϧʹ໰͍߹Θ
    ͤɼTCP/UDPιέοτ৘ใΛϙʔϦϯάऔಘ
    ɾ઀ଓΛऴ୺͢ΔOSϓϩηε৘ใ΋͋Θͤͯ
    औಘ
    ɾιέοτ৘ใ: /proc/net/tcp΍Netlink sock_diag
    ɾϓϩηε৘ใ: /proc//{stat,fd}
    .
    .
    .
    ॲཧʹհೖ͠ͳ͍ͨΊ
    ௿Φʔόʔϔου

    View Slide

  33. 33
    TCP઀ଓͷґଘͷํ޲ͷܾఆ
    Host Y
    Port N Process B
    CONNECT
    Host X
    Port M
    Process A
    LISTEN
    ɾ઀ଓΛཁٻ͢ΔϗετY͸ɼ઀ଓΛड͚෇͚ΔϗετXʹґଘ͢Δ
    ɾϗετY͔ΒΈͯѼઌϙʔτ͕LISTENϙʔτMͰ͋Ε͹ɼHost Y
    ͔Β઀ଓΛཁٻ͍ͯ͠Δ͜ͱ͕Θ͔Δ
    ɾLISTENϙʔτ͸ɼϗετXͷOSʹ໰͍߹Θͤͯऔಘ͢Δ

    View Slide

  34. 34
    ɾ͢΂ͯͷ઀ଓ৘ใΛऩू͢ΔͱɼCMDBʹ֨ೲ͢Δσʔλྔ͕େ͖͘
    ͳΔͨΊɼ৑௕ͳ৘ใΛ࡟ݮ͢Δ
    ɾΤϑΣϝϥϧϙʔτ: Χʔωϧ͔ΒׂΓ౰ͯΒΕΔϥϯμϜͳૹ৴ݩ
    ϙʔτ
    ɾಛఆͷLISTENϙʔτ΁ෳ਺ͷΤϑΣϝϥϧϙʔτ͔Β઀ଓ͞ΕΔ
    ɾ͜ΕΒͷϑϩʔΛू໿͠ɼ1ݸͷू໿ϑϩʔͱΈͳ͢
    ΤϑΣϝϥϧϙʔτͷू໿
    Host
    Port
    Process Port
    Port
    Host
    Port Process
    1ݸͷϑϩʔ
    ͱͯ͠ू໿
    LISTEN
    ΤϑΣϝϥϧ

    View Slide

  35. ࣮૷
    https://github.com/yuuki/transtracer

    View Slide

  36. 36
    ߏ੒ιϑτ΢ΣΞ
    ɾCMDB: PostgreSQL 11.3
    ɾttracerd: TracerΤʔδΣϯτ
    ɾttctl: CMDBʹ໰͍߹ΘͤΔCLI
    Host/Pod
    PostgreSQL
    ttracerd
    ttctl
    CMDB
    Send trace data

    View Slide

  37. 37
    Transtracerͷར༻ྫ
    $ ttctl --dbhost 10.0.0.20 --ipv4 10.0.0.10
    10.0.0.10:80 (’nginx’, pgid=4656)
    ᵋ<-- 10.0.0.11:many (’wrk’, pgid=5982)
    10.0.0.10:80 (’nginx’, pgid=4656)
    ᵋ--> 10.0.0.12:8080 (’python’, pgid=6111)
    10.0.0.10:many (’fluentd’, pgid=2127)
    ᵋ--> 10.0.0.13:24224 (’fluentd’, pgid=2001)
    10.0.0.10
    nginx
    10.0.0.11
    wrk
    10.0.0.12
    python
    10.0.0.13
    fluentd
    :80
    fluentd
    :8080
    :24224

    View Slide

  38. 38
    ϓϩηεͷσʔλߏ଄
    ᶃ LinuxͷϓϩηεάϧʔϓΛ
    ϊʔυͷ࠷খ୯Ґͱ͢Δ
    ᵓᴷnginx,627,627
    ᴹ ᵓᴷnginx,628,627
    ᴹ ᵋᴷnginx,629,627
    $ pstree -apg | grep nginx
    ᶄ (machine-id, ipv4, pgid, pname)
    ͰϓϩηεʹҰҙ੍໿Λ͔͚Δ
    ϓϩηε͸࠶ىಈ͢ΔͱID͕มԽ
    ͯ͠͠·͏ͨΊɼ໰͍߹Θͤ࣌ʹ
    ͸pgid͕ҟͳΔ΋ͷΛॏෳഉআ
    IPΞυϨεͷ࠶ར༻ʹରԠ͢Δͨ
    Ίʹmachine-idΛར༻
    (machine-id͸ະ࣮૷)

    View Slide

  39. 39
    ઀ଓ؅ཧͷͨΊͷσʔλߏ଄
    ᶅ ActiveͱPassiveʹϊʔυΛ෼ྨ
    ᶆ Active => PassiveͷϑϩʔΛอଘ
    Active Passive
    Activeଆͷϙʔτ͸ू໿ࡁΈ
    ͳͷͰอ࣋͠ͳ͍ (p.33)
    PassiveଆͷΈϦοεϯϙʔτ
    Λอ࣋
    Process
    Passive
    Active
    Port N
    Port M
    Active
    ಉҰϓϩηε͕Activeʹ΋
    Passiveʹ΋ͳΓ͑Δ
    ಉҰϓϩηε͕ෳ਺ͷϙʔτ
    ΛϦοεϯ͢Δ͜ͱ͕͋Δ

    View Slide

  40. 40
    CMDBͷςʔϒϧεΩʔϚ(ൈਮ)
    ςʔϒϧ໊ Ωʔ આ໌
    processes process_id
    ipv4
    pgid
    pname
    ϢχʔΫ੍໿
    ϓϩηεΛࣝผ͢ΔओΩʔ
    ϓϩηε͕ಈ࡞͢Δϗετ্ͷIPΞυϨε
    LinuxͷϓϩηεάϧʔϓID
    ϓϩηε໊
    (ipv4, pgid, pname)
    active_nodes node_id
    process_id
    ϢχʔΫ੍໿
    ϊʔυΛࣝผ͢ΔओΩʔ
    processesςʔϒϧʹର͢Δ֎෦Ωʔ
    (process_id)
    passive_nodes node_id
    process_id
    port
    ϢχʔΫ੍໿
    ϊʔυΛࣝผ͢ΔओΩʔ
    processesςʔϒϧʹର͢Δ֎෦Ωʔ
    Ϧοεϯϙʔτ൪߸
    (process_id, port)
    flows flow_id
    active_node_id
    passive_node_id
    ϢχʔΫ੍໿
    ϊʔυಉ࢜ͷ઀ଓΛࣝผ͢ΔओΩʔ
    active_nodesςʔϒϧ΁ͷ֎෦Ωʔ
    passive_nodesςʔϒϧ΁ͷ֎෦Ωʔ
    (active_node_id, passive_node_id)

    View Slide

  41. 4.
    ࣮ݧ

    View Slide

  42. 42
    1. ΞϓϦέʔγϣϯʹ༩͑ΔԠ౴஗ԆΦʔόʔϔουͷධՁ
    2. ઀ଓ৘ใऔಘͷͨΊͷCPUར༻཰ΦʔόʔϔουͷධՁ
    ධՁ࣮ݧ

    View Slide

  43. 43
    ࣮ݧ؀ڥ
    ߲໨ ࢓༷
    Client CPU
    Memory
    Benchmarker
    Intel Xeon CPU E5-2650 v3 2.30GHz 2core
    1 GB
    wrk 4.1.0-4
    Server CPU
    Memory
    HTTP Server
    Intel Xeon CPU E5-2650 v3 2.30GHz 4core
    1GB
    nginx 1.17.3
    CMDB CPU
    Memory
    Database
    Intel Xeon CPU E5-2650 v3 2.30GHz 1core
    1 GB
    PostgreSQL 11.3
    ɾΠϯελϯε͸͢΂ͯ͘͞ΒͷΫϥ΢υ্ʹߏங
    ɾLinux Kernel 4.15 (Ubuntu Server 18.04.3 LTS)

    View Slide

  44. 44
    ɾNormal: ௥੻ॲཧ͕ͳ͍ঢ়ଶ
    ɾTranstracer: TranstracerʹΑΔ௥੻ॲཧ͕૸͍ͬͯΔঢ়ଶ
    ɾiptables NEWϑΟϧλํࣜ: ৽ن઀ଓͷΈϩάΛग़ྗ
    ɾ-I INPUT -m state --state NEW -m limit -j TRACE-LOG
    ɾiptables ESTBϑΟϧλํࣜ: ઀ଓཱ֬தʹ΍ΓͱΓ͞ΕΔύέοτ
    ͷϩάΛ͢΂ͯग़ྗ
    ɾ-I INPUT -m state --state ESTABLISHED -m limit -j TRACE-LOG
    ֤࣮૷ͷઃఆ

    View Slide

  45. 45
    Ԡ౴஗ԆΦʔόʔϔου
    50
    100
    150
    200
    250
    300
    350
    400
    450
    500
    5000 10000 15000 20000
    Average Latency (ms)
    Connections
    Normal
    93.1
    191.6
    279.3
    353.8
    Transtracer
    94.7
    188.3
    291.8
    401.2
    ESTB filter
    115.0
    236.0
    359.0
    462.5
    NEW filter
    113.1
    214.4
    310.0
    449.3
    ɾNormalʹରͯ͠transtracer͕
    1.7~13.4%ͷΦʔόϔου૿Ճ
    ɾiptables࣮૷ͷESTBϑΟϧλํࣜ
    ʹରͯ͠ɼtranstracer͕13-20%
    ͷΦʔόϔουݮগ

    View Slide

  46. 46
    CPUར༻཰Φʔόʔϔου
    0
    10
    20
    30
    40
    50
    60
    70
    80
    90
    100
    5000 10000 15000 20000
    0
    50
    100
    150
    200
    250
    300
    350
    400
    450
    500
    CPU usage (%)
    Reading sockets time(ms)
    Connections
    ttracerd’s CPU usage
    13.2
    23.0
    34.2
    44.4
    ESTB filter’s CPU usage
    72.2
    75.9
    78.8 78.6
    Reading sockets time
    102.3
    199.1
    317.8
    408.6
    ɾ20,000઀ଓʹ͓͍ͯɼTranstracer
    ͷCPUར༻཰44.4%ɼESTBϑΟϧ
    λํࣜͷCPUར༻཰͸78.6%
    ɾ43.5%ͷCPUར༻཰ͷ௿ݮ
    ɾNEWϑΟϧλํࣜ͸઀ଓཱ֬࣌ͷ
    ΈCPUΛར༻͢ΔͨΊ༗ར

    View Slide

  47. 47
    ɾ20,000઀ଓ࣌ʹϙʔϦϯάؒ
    ִΛ૿Ճͤ͞ΔͱCPUར༻཰
    ͕Ͳͷఔ౓௿ݮ͞ΕΔ͔
    ɾ5ඵҎ಺ͷ୹໋ͳ઀ଓΛݕग़Ͱ
    ͖ͳ͘ͳΔՄೳੑ͕͋Δ͔Θ
    Γʹ8.6%·Ͱ௿ݮՄೳ
    ɾHTTP/2ͷΑ͏ʹ઀ଓΛӬଓԽ
    ͢Δ؀ڥͰ͸୹໋ͳ઀ଓΛݟ
    ಀͯ͠΋໰୊ʹͳΒͳ͍
    ϙʔϦϯάִؒͱCPUར༻཰ͷؔ܎
    0
    5
    10
    15
    20
    25
    30
    35
    40
    45
    50
    55
    1 2 3 4 5
    CPU usage (%)
    Polling interval
    CPU usage
    44.4
    21.6
    13.0
    10.8
    8.6

    View Slide

  48. 5.
    ࠓޙͷల๬

    View Slide

  49. 49
    KubernetesରԠ
    ɾPodͷαΠυΧʔͱͯ͠TracerϓϩηεΛ഑ஔ͢Δ
    ɾPodؒ઀ଓ΍Ϋϥελ֎઀ଓͰɼServiceϦιʔεΛར༻͢Δ৔߹ͷ
    NATରԠ
    ɾLinuxͷconntrackͷςʔϒϧΛεΩϟϯͯ͠ରԠ෇͚
    Pod Pod
    Endpoint
    Tracer
    ίϯςφ
    NAT

    View Slide

  50. 50
    ɾϙʔϦϯά͕ݟಀ͢୹໋ͳ
    ઀ଓΛݕग़͢ΔͨΊʹετ
    ϦʔϛϯάΛ૊Έ߹ΘͤΔ
    ɾeBPFʹΑΓɼconnect(2)ͱ
    accept(2)ΠϕϯτΛऔಘ
    ͠ɼϑϩʔ৘ใΛऔಘ͢Δ
    ɾUDPͷ৔߹͸send_msg(2),
    recv_msg(2)Πϕϯτ
    ετϦʔϛϯάʹΑΔ઀ଓͷݕग़
    Linux Host
    Kernel
    Process Process
    TCP/UDP Flows

    .
    .
    .
    User
    Streaming
    Tracer

    View Slide

  51. 51
    ɾiovisor/bcc಺ͷtcpacceptͰɼඇӬଓԽ؀ڥͰͷෛՙ࣮ݧ
    ɾwrk (HTTP KeepAlive off)Ͱಉ࣌઀ଓ1000ͰnginxʹϕϯνϚʔΫ
    ɾCPUར༻཰͸45~50%/coreఔ౓
    ɾԠ౴஗ԆͷΦʔόϔου͸༗ҙͳѱԽ͸ݟΒΕͳ͔ͬͨ
    ɾeBPFͷΠϕϯτΛ͢΂ͯϢʔβʔϥϯυʹίϐʔ͍ͯ͠Δ͜ͱ͕
    CPUෛՙ͕ߴ͍ཁҼ͔΋͠Εͳ͍
    eBPFͷετϦʔϛϯάෛՙͷ༧උ࣮ݧ
    ৄࡉ͸࣮ݧϊʔτ΁ https://www.notion.so/yuuk1/iovisor-bcc-tcpconnect-tcpaccept-af2d1fdce35c49fb945b548db373213d

    View Slide

  52. 52
    ɾCMDBͰ͸ͳ͘ɼ֤ΤʔδΣϯτͷϩʔΧϧྖҬʹࣗ਎ͷϑϩʔ
    σʔλΛอ࣋͢Δ
    ɾ೚ҙͷΤʔδΣϯτʹ໰͍߹ΘͤɼཁٻʹԠͨ͡ϑϩʔσʔλΛ֤
    ΤʔδΣϯτ͔Β෼ࢄऔಘ͢Δ
    ɾऔಘ࣌ʹ֤ΤʔδΣϯτ͸ࣗ਎ͷϗετ্ͷґଘؔ܎ʹ͕͍ͨ͠ɼ
    ྡͷϗετ্ͷΤʔδΣϯτʹ໰͍߹ΘͤΔ
    CMDBͷηοτΞοϓίετΛ࡟ݮ

    View Slide

  53. 6.
    ·ͱΊ

    View Slide

  54. 54
    ɾγεςϜมߋ࣌ʹ৴པੑͷϦεΫΛ༧ଌ͢Δ͜ͱΛ໨తʹϓϩηε
    ؒґଘؔ܎Λ௥੻πʔϧ Transtracer Λ঺հͨ͠
    ɾαʔό΍ΞϓϦέʔγϣϯʹ༩͑ΔΦʔόʔϔουΛ௿ݮͤͭ͞
    ͭɼϓϩηεؒͷґଘؔ܎Λ໢ཏతʹݕग़Ͱ͖Δ
    ɾͨͩ͠ɼ਺ඵҎ಺ͷ୹໋઀ଓΛݟಀ͢Մೳੑ͕͋Δ
    ɾࠓޙ͸͢΂ͯͷ઀ଓϑϩʔΛ໢ཏ͢ΔલఏͰͲΕ͚ͩΦʔόʔϔο
    υΛ௿ݮ͍͔ͤͯ͘͞Λݚڀ։ൃ͍ͯ͘͠
    ·ͱΊ

    View Slide

  55. TranstracerͰղܾͰ͖ͦ͏ͳ՝୊
    ͕͋Γ·ͨ͠Β
    ϑΟʔυόοΫ͍͚ͨͩΔͱتͼ·͢
    Thank id:masayoshi / @yoyogidesaiz for their ideas related with this presentation.

    View Slide