クラウドネイティブな監視をMackerelで / Mackerel Day#2

クラウドネイティブな監視をMackerelで / Mackerel Day#2

Ca6281fff64797dc419b78f51f25c0a5?s=128

FUJIWARA Shunichiro

December 23, 2019
Tweet

Transcript

  1. Ϋϥ΢υωΠςΟϒͳ؂ࢹΛ Mackerel Ͱ 2019.12.23 Mackerel Day #2 @fujiwara

  2. @fujiwara .BDLFSFMΞϯόαμʔ ʙ HJUIVCDPNLBZBDFDTQSFTTP "NB[PO&$4σϓϩΠπʔϧ HJUIVCDPNGVKJXBSBMBNCSPMM "84-BNCEBσϓϩΠπʔ ϧ

  3. Game & Community

  4. None
  5. Ϋϥ΢υωΠςΟϒʁ $/$'$MPVE/BUJWF%FpOJUJPOW Ϋϥ΢υωΠςΟϒٕज़͸ɺύϒϦοΫΫϥ΢υɺϓϥΠϕʔτΫϥ΢υɺϋΠϒ ϦουΫϥ΢υͳͲͷۙ୅తͰμΠφϛοΫͳ؀ڥʹ͓͍ͯɺεέʔϥϒϧͳΞϓ ϦέʔγϣϯΛߏங͓Αͼ࣮ߦ͢ΔͨΊͷೳྗΛ૊৫ʹ΋ͨΒ͠·͢ɻ ͜ͷΞϓϩʔνͷ୅දྫʹɺίϯςφɺαʔϏεϝογϡɺϚΠΫϩαʔϏεɺΠ ϛϡʔλϒϧΠϯϑϥετϥΫνϟɺ͓ΑͼએݴܕAPI͕͋Γ·͢ɻ IUUQTHJUIVCDPNDODGUPDCMPCNBTUFS%&'*/*5*0/NE

  6. ίϯςφɺαʔϏεϝογϡɺϚΠΫϩαʔϏε… ͦΕΒͷٕज़Λ࢖͍ͬͯΔ㱠Ϋϥ΢υωΠςΟϒ "Design for Failure" ͱͦΕΛ࣮ݱ͢Δ࢓૊Έ΍ίϯϙʔωϯτɺͦΕΛ࢖͍ ͜ͳ͢͜ͱ͕ͦ͜Ϋϥ΢υωΠςΟϒ ো֐ͷൃੜʹରͯࣗ͠ಈ෮چͰ͖ΔΑ͏ʹσβΠϯ͢Δɻো֐ͷൃੜʹΑΔϢʔ βʔӨڹ͕ͦ΋ͦ΋ͳ͍Α͏ʹΞʔΩςΫνϟΛσβΠϯ͢Δ2 IUUQTTQFBLFSEFDLDPNUPSJDMTEFTJHOGPSGBJMVSFJTUIFUSVFDMPVEOBUJWF

  7. "Design for Failure" ো֐Λ૝ఆͯ͠γεςϜΛσβΠϯ͢Δ w Πϯελϯε͸ࢮ͵ w ϚωʔδυαʔϏε΋ࢮ͵ ͚Ͳେ఍GBJMPWFS͢Δ w

    σʔληϯλʔ͝ͱࢮ͵ كʹ͋Δ Ͳ͜·ͰΛ૝ఆ͠ɺͲ͔͜Β͸ఘΊΔ͔ ؂ࢹ΋ͦΕΒΛંΓࠐΜͰઃܭ͢ΔˠΫϥ΢υωΠςΟϒͳ؂ࢹ
  8. ؂ࢹର৅͸ಈతʹ૿͑ͨΓݮͬͨΓ͢Δ ίϯςφ͸σϓϩΠ͝ͱʹੜ·ΕมΘΔ ίϯςφͰͳͯ͘΋ w &$ͷΦʔτεέʔϦϯάͰ4QPUΠϯελϯεΛ࢖͏ w ΘΓͱ͙͢མͪΔɺଞͷ্͕͕Δ ϚωʔδυαʔϏε΋ෛՙʹԠͯ͡૿ݮͰ͖ΔΑ͏ʹͳ͖ͬͯͨ "VSPSB"VUP4DBMJOHͳͲ

  9. Mackerel ͰΫϥ΢υωΠςΟϒͷୈҰา ʮDPOOFDUJWJUZݕ஌Λ΍ΊΔʯ

  10. connec%vity ݕ஌Λ΍ΊΔ ͜ͷΞϥʔτ͸ ! $SJUJDBM͔͠ͳ͍ w ։ൃ؀ڥ͕ਂ໷ʹམͪͯඈͼى͖Δඞཁ͸ͳ͍ w ຊ൪؀ڥͰ୆མͪͯ΋αʔϏεʹӨڹ͢Δϗετ͸ͳ͍ Α͏ʹ࡞Δ

    ϗετ͕ࢮ͵͜ͱΛલఏʹσβΠϯ͢ΔʹΫϥ΢υωΠςΟϒ $SJUJDBMͳΞϥʔτ͸ʮαʔϏεͷܧଓੑʹӨڹ͢Δ΋ͷʯͷΈ
  11. Mackerel ͷΫϥ΢υωΠςΟϒ͞ ! ϗετ͕૿͑ͯ΋ݮͬͯ΋ࣗಈͰ௥ैͰ͖Δ NBDLFSFMDPOUBJOFSBHFOU "84"[VSFΠϯςάϨʔγϣϯ ! ୀ໾͢ΔͱϗετϝτϦοΫ͕ݟ͑ͳ͘ͳΔ $16ͳͲҰ෦ͷϝτϦοΫͷΈ࢒ΔɺΧελϜϝτϦοΫ͸ ߹ܭ΍ฏۉͳͲ͸௕ظͰ௥͍͍͕ͨɺফ͑ͯ͠·͏

    ! ϗετ୯Ґ՝ۚ ϚΠΫϩϗετ ԁ݄ Ͱ΋424΍-BNCEBͷΑ͏ʹ ͍҆ɺϝτϦοΫ͕গͳ͍ର৅ͷ؂ࢹʹ͸ͪΐͬͱߴ͍ʜ ૯ϝτϦοΫ਺՝ۚϓϥϯ͕΄͍͠
  12. ϚωʔδυαʔϏεͷਐԽʹظ଴ͭͭ͠ Ϋϥ΢υωΠςΟϒͳ؂ࢹΛਐΊΔͨΊʹ ʮܺؒՈ۩044ʯ ϚωʔδυαʔϏεͷػೳ΍αʔϏεؒ࿈ܞ͕ࣗ෼ͨͪͷӡ༻ʹ͓͍ͯෆे෼ͳ৔ ߹ʹɺͦͷ伱ؒΛຒΊͯΑΓΑ͍ӡ༻Λ࣮ݱ͢ΔͨΊʹ։ൃ͞Εͨιϑτ΢ΣΞɻ ಛʹOSSͷ΋ͷΛࢦ͢ɻ3 IUUQTTQFBLFSEFDLDPNGVKJXBSBYJKJBOKJBKVPTTGBMTFTVTVNF IUUQTTQFBLFSEFDLDPNGVKJXBSBBXTEFWEBZUPLZP

  13. ࠓ೔঺հ͢Δ伱ؒՈ۩ OSS w NBQSPCF w NBDLFSFMQMVHJOQSPNFUIFVTRVFSZ IUUQTHJUIVCDPNGVKJXBSBNBDLFSFMQMVHJOQSPNFUIFVTRVFSZ IUUQTHJUIVCDPNGVKJXBSBNBQSPCF

  14. ʲ՝୊ʳAWSΠϯςάϨʔγϣϯͰొ࿥͞ΕͨϗετͰ mackerel-plugin ͰͷϝτϦοΫ΋औΓ͍ͨ

  15. ྫɿAmazon RDS(MySQL)ʹରͯ͠ mackerel-plugin-mysql Λ࣮ߦ

  16. ʲղ๏ʳͲ͔͜ͷϗετͷ mackerel-agent Ͱ plugin ࣮ߦʁ [plugin.metrics.rds01] command = "mackerel-plugin-mysql -host='rds01.***.ap-northeast-1.rds.amazonaws.com'

    (ུ)" custom_identifier = "rds01.***.ap-northeast-1.rds.amazonaws.com" [plugin.metrics.rds02] command = "mackerel-plugin-mysql -host='rds02.***.ap-northeast-1.rds.amazonaws.com' (ུ)" custom_identifier = "rds02.***.ap-northeast-1.rds.amazonaws.com" ! ͜ͷϗετ͕མͪͨΒϝτϦοΫऩू͕ࢭ·Δ " ͋ͱ͔Β૿͑ͨϗετΛ؂ࢹ͢Δͷʹઃఆมߋ͕໘౗ # ࠷ۙ͸NBDLFSFMBHFOU Λಈ͔͢ϗετ ͕ͳ͍͜ͱ΋ʜ ؂ࢹର৅ͷ૿ݮʹࣗಈ௥ै͍ͨ͠ʂ
  17. maprobe w .BDLFSFMʹొ࿥͞Εͨϗετʹରͯ͠ w ֎ܗ؂ࢹQJOHUDQIUUQ w NBDLFSFMQMVHJOΛ࣮ߦ ϗετϝτϦοΫͱͯ͠౤ߘ w ొ࿥ࡁΈͷϗετϝτϦοΫΛू໿͠

    αʔϏεϝτϦοΫͱͯ͠౤ߘ Λߦ͏ͨΊͷΤʔδΣϯτ
  18. ʲղ๏ʳ maprobe Ͱ plugin ࣮ߦ probes: - service: production role:

    RDS command: command: - 'mackerel-plugin-mysql' - '-host={{.Host.CustomIdentifier}}' - '-username=root' - '-password={{env "RDS_PASSWORD"}}' αʔϏεQSPEVDUJPO ϩʔϧ3%4ͷϗετશͯʹରͯ͠ NBDLFSFMQMVHJONZTRMΛ࣮ߦ ݁ՌΛݸʑͷϗετϝτϦοΫͱͯ͠.BDLFSFM΁ૹ৴͢Δ
  19. maprobe ͸ର৅ϗετͷ૿ݮʹࣗಈ௥ै ຖ෼.BDLFSFM"1*Λୟ͍ͯϗετΛݕࡧ ! ϗετͷ૿ݮʹࣗಈͰ௥ै %PDLFSίϯςφΞϦ㽂 docker pull fujiwara/maprobe 4ʹஔ͍ͨઃఆϑΝΠϧΛࣗಈͰ࠶ಡΈࠐΈ

    ! 4Λߋ৽͢Ε͹ίϯςφϏϧυɾσϓϩΠෆ༻Ͱઃఆ൓ө maprobe agent --config s3://example.com/config.yaml IUUQTIVCEPDLFSDPNSGVKJXBSBNBQSPCF
  20. ʲ՝୊ʳconnec%vity Λ΍ΊͨΒ ϗετͷࢮ׆؂ࢹ͸Ͳ͏͢Δʁ

  21. ʲղ๏ʳmaprobe ͷϔϧενΣοΫػೳ ૊ΈࠐΈͷϔϧενΣοΫػೳ QJOH 5$1 )551͕͋Δ probes: - service: production

    role: EC2 ping: address: "{{ .Host.IPAddresses.eth0 }}" - service: production role: ElastiCacheRedis tcp: host: "{{ .Host.CustomIdentifier }}" port: 6379 send: "PING\n" expect_pattern: "PONG"
  22. maprobe ͰͷϔϧενΣοΫ NBQSPCFͷϔϧενΣοΫ݁Ռ͸ϗετϝτϦοΫʹͳΔ DIFDL؂ࢹͰ͸ͳ͍

  23. check ؂ࢹ͕Α͘ͳ͍ͱ͜Ζ(ࢲݟ) ઃఆมߋ͕ϑΝΠϧͷमਖ਼ σϓϩΠ ʮͪΐͬͱ͍·͚ͩ؂ࢹP⒎ᮢ஋มߋʯ͕೉͍͠ ᮢ஋ͷධՁํ๏͕ϓϥάΠϯ͝ͱʹ·ͪ·ͪ --critical-under͸ʮҎԼʯ͔ʮະຬʯ͔ʜ Ұ౓ʹଟ਺ͷϗετͰൃใ͕ͪ͠ ਺ेϗετ͔ΒDIFDL؂ࢹࣦഊ͕དྷͯ΋ݪҼ͸ݸͩͬͨΓ OUQͷ࣌ࠁͣΕɺEBFNPOͷઃఆ

    EFQMPZ ϛεʜ
  24. check ؂ࢹ = metric ؂ࢹͷಛघͳύλʔϯ ϝτϦοΫΛอଘͯͦ͠ΕΛධՁ͢Ε͹ಉ͜͡ͱ͕Ͱ͖Δ ϝτϦοΫ؂ࢹɺࣜ؂ࢹΛ׆༻͢Δ

  25. ྫɿ ping ʹΑΔࢮ׆؂ࢹ sum(role(production:EC2, ping.count.failure)) QSPDVUDJPO&$ͷ͍ͣΕ͔ͷϗετʹQJOH͕ࣦഊͨ͠ΒXBSO Կ୆͔མͪͯ΋αʔϏε͕ఏڙͰ͖͍ͯΕ͹$SJUJDBMͰ͸ͳ͍

  26. ྫɿ job queue ͷ଺ཹ job ਺ΛΞϥʔτ sum(role(production:job-queue, custom.gearmand.queue.*.total)) ෳ਺ͷϗετʹKPCRVFVF͕͋Δ ଺ཹKPC਺ΛϝτϦοΫʹ͍ͯ͠Δ

    ཷ·Δͱ͖͸શͯͷRVFVF͕ཷ·Δ͜ͱ͕ଟ͍ DIFDL؂ࢹͰݸผʹΞϥʔτ͢Δͱશ෦ͷϗετͰൃใ͕ͪ͠ ߹ܭΛݟΔ͜ͱͰશମͷॲཧঢ়گΛ೺Ѳ͢Δ
  27. ʲ՝୊ʳୀ໾͢Δͱফ͑ͯ͠·͏ ϗετϝτϦοΫΛ௥͍͍ͨ

  28. ΧελϜϝτϦοΫ͸ϗετ͕ୀ໾͢Δͱফ͑ͯ͠·͏

  29. ʲղ๏ʳmaprobe ͰϗετϝτϦοΫΛू໿อଘ aggregates: - service: production role: push-server metrics: -

    name: custom.push.messages.sent outputs: - func: sum name: custom.push.messages.total_sent αʔϏεQSPEVDUJPO ϩʔϧQVTITFSWFSʹରͯ͠ ϗετϝτϦοΫͷQVTINFTTBHFTTFOUΛશ୆෼औಘ ˠԋࢉͨ݁͠ՌΛαʔϏεϝτϦοΫͱͯ͠อଘ͢Δ
  30. maprobe aggregate func/ons ݱࡏTVN NJO NBY BWFSBHF DPVOUΛαϙʔτ QFSDFOUJMF΋͋ͬͨ΄͏͕Αͦ͞͏͚ͩͲະ࣮૷ ΧελϜϝτϦοΫ͕ফ͑ͳ͚Ε͹ࣜάϥϑͰ׬݁ͳͷͰԿଔ

  31. ʲ՝୊ʳ΋ͬͱΫϥ΢υωΠςΟϒͳ Ϧιʔεͷ؂ࢹ

  32. ϚΠΫϩαʔϏεʂ αʔϏεϝογϡʂ &OWPZΛ࢖͍࢝Ί͍ͯΔͷͰɺϝτϦΫεΛऔΓ͍ͨ ͱ͋Δ&OWPZͷ/statsΛୟ͘ͱʜ $ curl -s x.x.x.x:9901/stats ... cluster.web.default.total_match_count:

    1 cluster.web.external.upstream_rq_200: 988 cluster.web.external.upstream_rq_2xx: 988 cluster.web.external.upstream_rq_302: 13 cluster.web.external.upstream_rq_3xx: 13 cluster.web.external.upstream_rq_400: 3 cluster.web.external.upstream_rq_403: 26 cluster.web.external.upstream_rq_404: 5 cluster.web.external.upstream_rq_4xx: 34 .... IUUQTFOWPZQSPYZJP
  33. ͜ΕΛશ෦ Mackerel ʹૹΕ͹Α͍ʁ cluster.web.default.total_match_count: 1 cluster.web.external.upstream_rq_200: 988 cluster.web.external.upstream_rq_2xx: 988 cluster.web.external.upstream_rq_302:

    13 cluster.web.external.upstream_rq_3xx: 13 cluster.web.external.upstream_rq_400: 3 cluster.web.external.upstream_rq_403: 26 cluster.web.external.upstream_rq_404: 5 cluster.web.external.upstream_rq_4xx: 34 ...
  34. ໰͍߹Θͤ ϑΟʔυόοΫϑΥʔϜ͔Βؾܰʹฉ͍ͯΈͨ ۙʑenvoy(https://www.envoyproxy.io/)Λಋೖ༧ఆͳͷͰ͕͢ɺMackerel ެࣜͱͯ͠envoy statsऔಘϓϥάΠϯΛެ։͞ΕΔ༧ఆ͸͋Γ·͢Ͱ͠ΐ͏͔ʁ envoyͷϓϥάΠϯͰ͕͢ɺݱ࣌఺Ͱ͸ެࣜͱͯ͠ެ։͢Δ༧ఆ͸͍͟͝ ·ͤΜɻ ! ࡞Δ͔ʜʜ

  35. ͔͠͠ Envoy ͸େྔͷϝτϦοΫΛు͖ग़͢ $ curl -s x.x.x.x:9901/stats | wc -l

    337 ͜ΕΛશ෦.BDLFSFMʹ͍࣋ͬͯ͘ͱʜ ϚΠΫϩϗετ͸ϝτϦοΫϗετˠϗετ૬౰ &OWPZ͕͍Δϗετ͝ͱʹYԁ ੫ผ 
  36. ࡞ઓมߋ શ෦Λ͍࣋ͬͯ͘ͷ͸ίετ͕ݫ͍͠ ͱ͸͍͑ͲΜͳ஋͕΄͍͔͠Α͘෼͔͍ͬͯͳ͍ &OWPZӡ༻ܦݧ͕ઙ͍ͷͰɺӡ༻͠ͳ͕ΒݟΔ஋ΛܾΊ͍ͨ QMVHJOΛ࡞ͬͯ༗༻ͳϝτϦοΫΛऔΔʹ͸ӡ༻ܦݧ͕ඞཁ ͭ·ΓͱΓ͋͑ͣ͸શ෦औΓ͍ͨ

  37. Prometheus ͸Ͳ͏͔ʁ 1VMMܕϝτϦοΫऩूɾ؂ࢹπʔϧ อଘͨ͠஋Λ1SPN2-ΫΤϦͰॊೈʹՃ޻ͯ͠औಘͰ͖Δ ௕ظؒͷϝτϦοΫอଘ͸͋·Γߟྀ͞Ε͍ͯͳ͍ (SBGBOBͱ͔ͰՄࢹԽ͕ී௨ʁ

  38. ! Envoy → Prometheus → Mackerel զʑ͸.BDLFSFMͰΞϥʔτ͍ͨ͠͠άϥϑ΋ݟ͍ͨ Ͳͷ஋ΛऔΔ΂͖͔͕ݟ͑ͳ͍ͷͰɺ௚ۙ͸શ෦औ͓͖͍ͬͯͨ  &OWPZˠ1SPNFUIFVT

    ͜Ε͸͋Δ  1SPNFUIFVTˠΫΤϦ݁ՌΛNFUSJDQMVHJOܗࣜͰग़ྗ  NFUSJDQMVHJOܗࣜͷग़ྗΛ.BDLFSFMʹอଘ ͜Ε΋͋Δ ͜ͷ͚ͩ࡞Ε͹Α͍ͷͰ͸ʂ ଍Γͳ͍஋͸౎౓1SPNFUIFVTʹΫΤϦͯ͠औΕ͹Α͍
  39. mackrel-plugin-prometheus-query ࡞Γ·ͨ͠ HJUIVCDPNGVKJXBSBNBDLFSFMQMVHJOQSPNFUIFVTRVFSZ $ mackerel-plugin-prometheus-query \ -query "up" \ -metric-key-format

    "promq.{job}.{instance}" promq.web.10_1_129_175_9901 1 1575941187 promq.web.10_1_130_170_9901 1 1575941187 promq.web.10_1_131_53_9901 1 1575941187 promq.prometheus.localhost_9090 1 1575941187
  40. Prometheus ΁ͷΫΤϦྫ ྫɿ෼ؒͷVQTUSFBN΁ͷϦτϥΠճ਺ΛٻΊΔΫΤϦ sum( delta( envoy_cluster_upstream_rq_retry{envoy_cluster_name="web"}[1m] ) ) .BDLFSFMʹ͸ϗετ୯ҐͰ͸ͳ͘શମͷ஋ΛૹΔ ݸʑͷ஋͸1SPNFUIFVTΛݟΕ͹͋ΔͷͰ

  41. plugin ͷग़ྗΛ mkr throw Ͱ౤͛Δ $ mackerel-plugin-prometheus-query \ -query 'sum(delta(envoy_cluster_upstream_rq_retry_success

    {envoy_cluster_name="web"}[1m]))' \ -metric-key-format "envoy.web.upstream.retry.success" \ | mkr throw --service production
  42. ·ͱΊ Ϋϥ΢υωΠςΟϒͳ؂ࢹͱ͸ ো֐Λڐ༰͢Δઃܭશମͷ݈શੑΛݟΔ ͦͷಓ۩ͱͯ͠ͷܺؒՈ۩044 w NBQSPCF ࣗಈ௥ैɺ֎ܗ؂ࢹɺQMVHJO؂ࢹɺϝτϦοΫू໿ w NBDLFSFMQMVHJOQSPNFUIFVTRVFSZ TIPSUUFSN͸1SPNFUIFVTͰ

    MPOHUFSN BMFSUJOH͸.BDLFSFMͰ