Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Driving Mercari with 50+ custom plugins / Mackerel DAY

700669515ee872152d8b9403c2a0cf8c?s=47 kazeburo
October 06, 2017

Driving Mercari with 50+ custom plugins / Mackerel DAY

Driving Mercari with 50+ custom plugins
Mackerel Day 2017/10/05
#mackerelday

700669515ee872152d8b9403c2a0cf8c?s=128

kazeburo

October 06, 2017
Tweet

More Decks by kazeburo

Other Decks in Technology

Transcript

  1. Driving Mercari with 50+ custom Plugins Masahiro Nagano id:kazeburo Mackerel

    Day! 2017/10/05
  2. Me • Masahiro Nagano / ௕໺խ޿ • id:kazeburo • Mercari,

    Inc
 Principal Engineer
 Site Reliability Engineering (SRE) Team • BASE, Inc Technical Advisor
  3. Agenda • ϝϧΧϦ & ΠϯϑϥετϥΫνϟͷ؆୯ͳ঺հ • Service/Roleઃܭ & Deploy •

    MackerelʹΑΔ؂ࢹͱPlugins • ͦͷଞͷऔΓ૊Έ
  4. Mercari • ϑϦϚΞϓϦ • εϚϗͰࣸਅΛͱͬͯ؆୯ʹग़඼ • ҆৺ɾ҆શͳܾࡁ • ೔ຊɾUSɾUKͰల։

  5. ΠϯϑϥετϥΫνϟ ੴङDC ઐ༻αʔό JP Cloud US Cloud UK

  6. ΠϯϑϥετϥΫνϟ ੴङDC ઐ༻αʔό JP Cloud US Cloud UK

  7. Mackerel ಋೖཧ༝ • ҟͳΔΠϯϑϥετϥΫνϟͷ؂ࢹ߲໨ɾ಺༰Λڞ௨Խ͢Δ • Ҏલ͸Region͝ͱʹzabbixΛར༻ɻόʔδϣϯ͕ͣΕͨΓ؂ࢹ಺༰ͷ͕ࠩൃੜ • Service/RoleΛར༻͢Δ͜ͱͰ؅ཧ • αʔϏεϝτϦΫεͷॊೈͳ࢖͍উख

    • Nagiosޓ׵ͷPlugin
  8. MackerelҎ֎ͷ؂ࢹ New Relic Kurado άϥϑը૾ΛҰؾʹݟΕΔͷͰศར جຊతͳϝτϦΫε͸ͪ͜ΒͰݟΔ PHPͷ಺෦ͷτϨʔε ΞϓϦέʔγϣϯͷνϡʔχϯάͷࢀߟ

  9. Service/Roleઃܭ&Deploy

  10. Serviceઃܭ • αʔϏεΛߦ͏Region͝ͱʹServiceΛ෼͚Δ • mercari, mercari-us, mercari-gb • ֎ܗ؂ࢹ͸ผService •

    mercari-jp-exetenal, mercari-us-external, mercari-gb-external • ௨஌νϟϯωϧΛ෼͚ΔͨΊ • QA؀ڥɾϚΠΫϩαʔϏε
  11. Roleઃܭ • Role໊ͷPrefixʹҙຯΛ࣋ͨͤΔ • role- αʔόͷجຊతͳ໾ׂɻrole-mysqlɺrole-applicationͳͲ • z- ڞ௨ͷ໾ׂɻଟ͘ͷαʔό͸z-commonʹଐ͠ɺphp͕ೖ͍ͬͯΔαʔό͸z-phpΛ࣋ͭ •

    x- ؂ࢹ্ͷϑϥάɻrole-mysql͸ϨϓϦέʔγϣϯ؂ࢹΛߦ͏͕ɺx-mysql-masterΛ௥Ճ͢ Δ͜ͱͰ؂ࢹআ֎͢Δ • x- ͸ख࡞ۀͰ௥ՃΛߦ͏͜ͱ͕ଟ͍
  12. ⚙mackerel-agent.conf $ cat mackerel-agent.conf apikey = “AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaa=" include = “/etc/mackerel-agent/conf.d/*.conf"

    $ ls -1 conf.d/ role-mysql.conf z-common-jp.conf z-postfix.conf => mackerel-agent.conf ʹ͸ಛʹઃఆ͸ॻ͔ͳ͍ αʔόʹΑͬͯ഑෍͢Δconf͕ҟͳΔ role໊ͱ΄΅Ұக͕ͩ׬શʹҰॹͰ͸ͳ͍ αʔόʹ෇༩͢ΔRoleΛͲ͔͜ͰࣗಈͰઃఆ͍ͨ͠
  13. Roleͷࣗಈ෇༩ $ cat /etc/sysconfig/mackerel-agent ROLES=$(grep -h role-def: /etc/mackerel-agent/conf.d/*.conf \ |awk

    -F: '{printf "-role=mercari:" $2 " “}') OTHER_OPTS=$ROLES conf.d ҎԼͷϑΝΠϧʹ #role-def:ϩʔϧ໊ Λ௥Ճ͢Δͱ ىಈ࣌ʹಡΈࠐΈɺagentͷىಈΦϓγϣϯͱͯ͠ར༻ /usr/bin/mackerel-agent --pidfile=/var/run/mackerel-agent.pid --root=/var/lib/mackerel-agent \ -role=mercari:role-mysql -role=mercari:z-common -role=mercari:z-postfix
  14. Role ͷ࣮ࡍྫ $ head -1 role-mysql.conf z-common-jp.conf z-postfix.conf ==> role-mysql.conf

    <== ## role-def:role-mysql ==> z-common-jp.conf <== #role-def:z-common ==> z-postfix.conf <== #role-def:z-postfix ࣮ࡍʹ͸role-def͸ෳ਺ߦͰ΋ߏΘͳ͍
  15. x-Role ͷར༻ྫ • disk؂ࢹͷᮢ஋ΛRoleͰ੾Γସ͑Δ • (ӡ༻ͰՏഅʹར༻)

  16. Service/Roleઃܭ & Deploy • Roleͷprefixʹҙຯɻconf಺ʹRole໊Λॻ͍ͯࣗಈͰઃఆ • systemdʹͳΔͱಈ͔ͳ͍ͱ͍͏આ • conf͸AnsibleͰ഑෍ •

    playbookʹͲͷconfΛ഑Δͷ͔هड़ • plugin΋ಉ͡Ansible roleͰ؅ཧ͠ɺconfͱͱ΋ʹ഑Δ
  17. MackerelʹΑΔ؂ࢹͱPlugins

  18. ؂ࢹʹ·ͭΘΔ਺ࣈ • ؂ࢹϧʔϧ਺: 265 • Hostຖͷ؂ࢹϧʔϧ਺ • MySQL: 34 •

    Application: 39 • Search: 36 • Custom Plugin: 50+ (check + metrics + utils)
  19. z-common-jp Ͱߦ͏؂ࢹ • unbound ͷϓϩηε؂ࢹ • unbound Λ࢖໊ͬͨલղܾ • crond

    ͷϓϩηε؂ࢹ • sshd ͷϙʔτ؂ࢹ • /etc/passwd ϑΝΠϧͷมߋ؂ࢹ • Global IPͱiptable • unameͷมߋ؂ࢹ • hostnameͷมߋ؂ࢹ • uptime؂ࢹ • inode؂ࢹ • ϝϞϦΤϥʔ • HW-RAID؂ࢹ • [metrics] NTP • [metrics] Linux Lite (CPU, Load avg, Process, Memory) • [ඪ४] File System • [ඪ४] Swap
  20. Custom PluginʹΑΔ؂ࢹ • unbound ͷϓϩηε؂ࢹ • unbound Λ࢖໊ͬͨલղܾ • resolv.confΛಡΜͰ໊લղܾ

    • crond ͷϓϩηε؂ࢹ • sshd ͷϙʔτ؂ࢹ • /etc/passwd ϑΝΠϧͷมߋ؂ࢹ • Global IPͱiptable • unameͷมߋ؂ࢹ • hostnameͷมߋ؂ࢹ • uptime؂ࢹ • inode؂ࢹ • ϝϞϦΤϥʔ • HW-RAID؂ࢹ • [metrics] NTP • [metrics] Linux Lite (CPU, Load avg, Process, Memory) • [ඪ४] File System • [ඪ४] Swap
  21. check_resolver • resolv.conf ΛಠࣗʹಡΈࠐΜͰ໊લղܾ͢Δ • resolv.conf ͕ॻ͖׵Θ໊ͬͯલղܾʹࣦഊ͢ΔࣄނΛ๷͙ • @kazuho͞Μͷ Net::DNS::Lite

    Λར༻(pure-perlͳDNSΫϥΠΞϯτ) $ /etc/mackerel-agent/commands/check-resolver --host alive.local -w 2 -c 2 OK: elapsed_time 0.000888 sec (alive.local IN A 10.0.0.1)
  22. diff-detector • ίϚϯυ݁ՌͷมԽ͕͋ΔͱΞϥʔτ • `cat /etc/passwd`ɺ`uname -a`ɺ`hostname` Λݟ͍ͯΔ • https://github.com/kazeburo/diff-detector

    $ diff-detector -- date NG: detect difference: ```@@ -1 +1 @@ -Tue May 10 08:11:42 UTC 2016 +Tue May 10 08:11:43 UTC 2016```
  23. check-iptables • ͘͞Βͷઐ༻αʔό͸શͯglobal ipΛ࣋ͭɻෆඞཁͳαʔό͸disableʹͯ͠ӡ༻ • global ip͕༗ޮ: ip6?tables_filter͕load͞Εͯͳ͚Ε͹Ξϥʔτ • global

    ip͕ແޮ: ip6?tables_filter͕load͞Ε͍ͯΔͱΞϥʔτ • ෆ༻ҙͳ iptables --list Ͱiptables_filter͕ಡΈࠐ·ΕɺύϑΥʔϚϯεʹӨڹ͢ΔͷΛൃݟ $ /etc/mackerel-agent/commands/check-iptables OK: does not have global-ip and iptables(iptable_filter) is disabled
  24. check-iptables #!/bin/sh set -e if ( ip addr | grep

    'inet ' | fgrep -v 'inet 127.0.0.1' | grep -v -E '^ *inet (10\.| 192\.168|10\.|172\.1[6789]\.|172\.2[0-9]\.|172\.3[01]\.)' > /dev/null 2>&1 ); then if ( lsmod |egrep 'ip6?table_filter' > /dev/null 2>&1 ); then echo "OK: have global-ip and iptables(iptable_filter) is enabled" exit 0 else echo "NG: have global-ip. but iptables(iptable_filter) is disabled" exit 2 fi else ... fi
  25. check-uptime • ෆҙͳ࠶ىಈΛݕ஌ • ᮢ஋͸2෼-10ඵɻ1ճΞϥʔτ͕དྷ͙ͯ͢ʹ෮چ͢Δ • MySQL΍memcachedͰ΋ߦͳ͍ͬͯΔ $ /etc/mackerel-agent/commands/check-uptime -w

    110 -c 110 OK: up 59 days, 8:21
  26. check-inode • inode ރׇ๷ࢭ • OSɺRoleʹΑͬͯpartitionͷαΠζ͕एׯҟͳΔɻׂ߹Ͱ؂ࢹ͠ɺҰͭͰ ΋ᮢ஋Λ্ճΔͱΞϥʔτ • mackerel-plugin-inode ͸؂ࢹର৅ϝτϦοΫʹ

    wildcard ͕࢖͑ͳ͍ $ /etc/mackerel-agent/commands/check-inode -w 80 -c 90 OK: /:1%, /dev:1%, /dev/shm:1%, /run:1%, /sys/fs/cgroup:1%, /boot:1%, /run/ user/1037:1%
  27. check-machine-exceptions • ϝϞϦҟৗΛݕ஌ͨ͠ࡍͷϩάΛ؂ࢹ • ݕग़ޙ͸ϋʔυ΢ΣΞͷอकΛґཔ͢Δ $ /etc/mackerel-agent/commands/check-machine-exceptions OK: No Machine

    Check Exception found
  28. check-machine-exceptions ... sbridge: HANDLING MCE MEMORY ERROR CPU 0: Machine

    Check Exception: 0 Bank 8: cc0427c000010090 TSC 0 ADDR 37805ac0 MISC 45048ce86 PROCESSOR 0:406f1 TIME 1495654896 SOCKET 0 APIC 0 [Hardware Error]: Machine check events logged EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#0_Ha#0_Channel#0_DIMM": 4255 Unknown error(s): memory read on FATAL area OVERFLOW: cpu=0 Err=0001:0090 (ch=0), addr = 0x37805ac0 => socket=0, ha=1, Channel=0(mask=1), rank=0 ...
  29. check-raid-disk (MegaRAID) • MegaCLI Λ࢖͍֤෺ཧDiskͷঢ়ଶΛ؂ࢹ Spun UpͰͳ͚Ε͹Ξϥʔτ • ͘͞Βͷઐ༻αʔόͷStorage͸ඞͣRAIDߏ੒Ͱఏڙ͞ΕɺGlobalଆͷ͘͞Βͷઐ༻ αʔό͔ΒSNMPͰ؂ࢹ͞Ε͍ͯΔ

    • GlobalΛด͍ͯͨ͡ΓɺSNMPΛfilter͍ͯ͠Δͱ؂ࢹ͞Εͳ͍ɻࣗલͰ؂ࢹ͠໰୊͕͋Ε͹อकΛґཔ • SSD͸յΕͨ͜ͱͳ͍ $ /etc/mackerel-agent/commands/check-raid-disk Firmware state: Online, Spun Up Firmware state: Online, Spun Up
  30. mackerel-plugin-ntpq • offset(ઈର஋)ͱϦϞʔτͱͷSyncঢ়گͷՄࢹԽ • Sync < 0.1ɺoffset > 300 ͰΞϥʔτ

    • ঃʑʹ͕࣌ؒͣΕ͍ͯ͘αʔό͕͋Γɺntp.confͷௐ੔ʹՄࢹԽ͕ศར
  31. mackerel-plugin-linux-lite • 2Core͔Β56Core·Ͱαʔό͕͋Δ • Ұ؏ͨ͠؂ࢹᮢ஋Λઃ͚΍͍͢Α͏ʹՄࢹԽ • ۭ͖ϝϞϦ • 100% ্ݶͷCPU

    • ίΞ͋ͨΓͷϩʔυΞϕϨʔδ • ϓϩηε਺
  32. z-commonҎ֎Ͱར༻͢Δplugin

  33. ✂ periodic-checker • ಛఆͷ࣌ؒͷΈ؂ࢹΛߦ͏ • Daily ϝϯςφϯεͳͲ΁ͷରԠ • mackerel-agentͰཉ͍͠ͳʔ $

    periodic-checker --range 00:00-06:00,06:30-24:00 \ -- check-tcp -p 80
  34. check-dns-rr • consul+DNSͰՔಇ͍ͯ͠ΔαʔϏεͷ؂ࢹ • check-resolverಉ༷ɺಠࣗʹresolv.confΛಡΈࠐΉ • ᮢ஋ҎԼͷ୆਺ʹͳΔͱΞϥʔτ $ /etc/mackerel-agent/commands/check-dns-rr \


    --host=production.app.service.dc.consul -w 2 -c 2 OK: 3 dns-rr hosts found
  35. check-spf-and-reserve-lookup • ϝʔϧ഑৴ʹͯར༻ • ֘౰IP͕SPFϨίʔυʹؚ·Ε͍ͯΔ͔ɺٯҾ͖ͨ࣌͠ʹυϝΠϯؚ͕·Ε ͍ͯΔ͔Λ֬ೝ $ /usr/local/bin/check-spf-and-reserve-lookup 192.168.1.1 mercari.jp

    NG: spf check failed: result=SoftFail NG: reverse lookup dig failed: no result
  36. check-spf-and-reserve-lookup-all • αʔό͕͍࣋ͬͯΔGlobal IPશͯ֬ೝ #!/bin/bash set -e for ip in

    $(ip addr | grep 'inet ' | fgrep -v 'inet 127.0.0.1' | grep -v -E '^ *inet (10\.|192\.168|10\.|172\.1[6789]\.|172\.2[0-9]\.|172\.3[01]\.)' | sort | awk '{print $2}' | awk -F / '{print $1}') do /usr/local/bin/check-spf-and-reserve-lookup $ip mercari.jp done echo "OK: ALL"
  37. check-mysql-slave-sql-error • ʮϨϓϦέʔγϣϯ͕ࢭ·ͬͨ࣌ʹɺͦͷཧ༝΋௨஌ͯ͘͠ΕΔͱศརʯ Ͱ࡞ͬͨplugin • Multi Source ReplicationରԠ $ /usr/local/bin/check-mysql-slave-sql-error

    --user=monitor --password=xxx mysql-slave-sql-error - MySQL slave SQL error CRITICAL: Last_SQL_Error found: Error 'Table 'tmp_replication_stop' already exists' on query. Default database: 'mercari'. Query: 'CREATE TABLE tmp_replication_stop ...
  38. check-mysql-msr • MySQLͷMulti Source Replicationͷ؂ࢹ • 1ͭͰ΋ࢭ·͍ͬͯͨΓɺᮢ஋ΑΓ஗Ԇ͍ͯͨ͠ΒΞϥʔτ $ /usr/local/bin/check-mysql-msr --host=127.0.0.1

    --port=3306 -- user=monitor --password=xxx -w 1 -c 1 MySQL Multi Source Replication OK: [O]
 admin-db=io:Yes,sql:Yes,behind:0
 main-db=io:Yes,sql:Yes,behind:0
 web-db=io:Yes,sql:Yes,behind:0 (࣮ࡍ͸1ߦ)
  39. mackerel-plugin-msr • check-mysql-msrͷՄࢹԽ • mysql-msr.behind.* ʹରͯ͠؂ࢹ͕ߦ͑Ε͹check-mysql-msr͸ඞཁͳ͍...ʁ

  40. Open Source! https://github.com/kazeburo/ ଍Γͳ͍΋ͷɺݟ͍ͨ΋ͷ͕͋Ε͹ڭ͍͑ͯͩ͘͞ https://github.com/kazeburo/custom-mackerel-plugins

  41. ͦͷଞͷऔΓ૊Έ

  42. ໰͍߹Θͤ਺ͷ؂ࢹ • ଟ͘ͷਓ͕ࢀՃ͢ΔChannel΁௨஌ • ো֐ͷݕ஌ɺӨڹൣғͷ೺Ѳ • SRE͚ͩͰ͸ͳ͘ɺશνʔϜ͕ؔΘΓରԠͷਝ଎Խ

  43. ؂ࢹ͞Εͯͳ͍αʔόͷࣗಈநग़ • mackerl APIͱIPΞυϨεҰཡΛൺֱ͢Ε͹ग़ͤΔ • AWS/GCE Ͱ͸private ipͷҰཡ͸֤Ϋϥ΢υͷAPIΛ࢖͏͜ͱͰऔಘՄೳ • ͘͞Βઐ༻αʔό/͘͞ΒͷΫϥ΢υͰ͸औಘͰ͖ͳ͍

    • private networkΛࣗલͰ؅ཧ͢Δͷ͕࢓༷
  44. fping ʹΑΔૄ௨֬ೝ • fping͸ฒྻͯ͠ૄ௨֬ೝ͢ΔίϚϯυ • /24 ͷnetwork͝ͱʹfpingΛ࣮ߦ͠IPҰཡΛ࡞੒ • fpingΛ͞ΒʹฒྻԽ͢Δ͜ͱͰ࣌ؒ୹ॖ $

    seq 0 255 | xargs -I{} -P 16 fping -q -a -r 2 -i 10 -g x.y.{}.0/24 YBSHT͸Πϝʔδ
  45. Slack΁ͷ௨஌ • 1೔2ճslack΁౤ߘ • ؂ࢹ͞Εͯͳ͍αʔό • standbyɺpoweroff ͱͳ͍ͬͯΔαʔό • disk؂ࢹΛ؇Ί͍ͯΔαʔό

    • JP/US/UKͰ࣮ࢪ
  46. ·ͱΊ

  47. ίʔυΛॻ͍ͯ໰୊Λղܾ͢Δ Mackerel͸࢖͍͕͍ͷ͋Δπʔϧ

  48. We’re Hiring! ੈքʹ௅ΉɺϝϧΧϦ ଟ૚ଟॏଟ໘ΠϯϑϥετϥΫνϟɾ৴པੑͰࢧ͑ΔSRE www.mercari.com/jp/jobs/