Slide 1

Slide 1 text

Driving Mercari with 50+ custom Plugins Masahiro Nagano id:kazeburo Mackerel Day! 2017/10/05

Slide 2

Slide 2 text

Me • Masahiro Nagano / ௕໺խ޿ • id:kazeburo • Mercari, Inc
 Principal Engineer
 Site Reliability Engineering (SRE) Team • BASE, Inc Technical Advisor

Slide 3

Slide 3 text

Agenda • ϝϧΧϦ & ΠϯϑϥετϥΫνϟͷ؆୯ͳ঺հ • Service/Roleઃܭ & Deploy • MackerelʹΑΔ؂ࢹͱPlugins • ͦͷଞͷऔΓ૊Έ

Slide 4

Slide 4 text

Mercari • ϑϦϚΞϓϦ • εϚϗͰࣸਅΛͱͬͯ؆୯ʹग़඼ • ҆৺ɾ҆શͳܾࡁ • ೔ຊɾUSɾUKͰల։

Slide 5

Slide 5 text

ΠϯϑϥετϥΫνϟ ੴङDC ઐ༻αʔό JP Cloud US Cloud UK

Slide 6

Slide 6 text

ΠϯϑϥετϥΫνϟ ੴङDC ઐ༻αʔό JP Cloud US Cloud UK

Slide 7

Slide 7 text

Mackerel ಋೖཧ༝ • ҟͳΔΠϯϑϥετϥΫνϟͷ؂ࢹ߲໨ɾ಺༰Λڞ௨Խ͢Δ • Ҏલ͸Region͝ͱʹzabbixΛར༻ɻόʔδϣϯ͕ͣΕͨΓ؂ࢹ಺༰ͷ͕ࠩൃੜ • Service/RoleΛར༻͢Δ͜ͱͰ؅ཧ • αʔϏεϝτϦΫεͷॊೈͳ࢖͍উख • Nagiosޓ׵ͷPlugin

Slide 8

Slide 8 text

MackerelҎ֎ͷ؂ࢹ New Relic Kurado άϥϑը૾ΛҰؾʹݟΕΔͷͰศར جຊతͳϝτϦΫε͸ͪ͜ΒͰݟΔ PHPͷ಺෦ͷτϨʔε ΞϓϦέʔγϣϯͷνϡʔχϯάͷࢀߟ

Slide 9

Slide 9 text

Service/Roleઃܭ&Deploy

Slide 10

Slide 10 text

Serviceઃܭ • αʔϏεΛߦ͏Region͝ͱʹServiceΛ෼͚Δ • mercari, mercari-us, mercari-gb • ֎ܗ؂ࢹ͸ผService • mercari-jp-exetenal, mercari-us-external, mercari-gb-external • ௨஌νϟϯωϧΛ෼͚ΔͨΊ • QA؀ڥɾϚΠΫϩαʔϏε

Slide 11

Slide 11 text

Roleઃܭ • Role໊ͷPrefixʹҙຯΛ࣋ͨͤΔ • role- αʔόͷجຊతͳ໾ׂɻrole-mysqlɺrole-applicationͳͲ • z- ڞ௨ͷ໾ׂɻଟ͘ͷαʔό͸z-commonʹଐ͠ɺphp͕ೖ͍ͬͯΔαʔό͸z-phpΛ࣋ͭ • x- ؂ࢹ্ͷϑϥάɻrole-mysql͸ϨϓϦέʔγϣϯ؂ࢹΛߦ͏͕ɺx-mysql-masterΛ௥Ճ͢ Δ͜ͱͰ؂ࢹআ֎͢Δ • x- ͸ख࡞ۀͰ௥ՃΛߦ͏͜ͱ͕ଟ͍

Slide 12

Slide 12 text

⚙mackerel-agent.conf $ cat mackerel-agent.conf apikey = “AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaa=" include = “/etc/mackerel-agent/conf.d/*.conf" $ ls -1 conf.d/ role-mysql.conf z-common-jp.conf z-postfix.conf => mackerel-agent.conf ʹ͸ಛʹઃఆ͸ॻ͔ͳ͍ αʔόʹΑͬͯ഑෍͢Δconf͕ҟͳΔ role໊ͱ΄΅Ұக͕ͩ׬શʹҰॹͰ͸ͳ͍ αʔόʹ෇༩͢ΔRoleΛͲ͔͜ͰࣗಈͰઃఆ͍ͨ͠

Slide 13

Slide 13 text

Roleͷࣗಈ෇༩ $ cat /etc/sysconfig/mackerel-agent ROLES=$(grep -h role-def: /etc/mackerel-agent/conf.d/*.conf \ |awk -F: '{printf "-role=mercari:" $2 " “}') OTHER_OPTS=$ROLES conf.d ҎԼͷϑΝΠϧʹ #role-def:ϩʔϧ໊ Λ௥Ճ͢Δͱ ىಈ࣌ʹಡΈࠐΈɺagentͷىಈΦϓγϣϯͱͯ͠ར༻ /usr/bin/mackerel-agent --pidfile=/var/run/mackerel-agent.pid --root=/var/lib/mackerel-agent \ -role=mercari:role-mysql -role=mercari:z-common -role=mercari:z-postfix

Slide 14

Slide 14 text

Role ͷ࣮ࡍྫ $ head -1 role-mysql.conf z-common-jp.conf z-postfix.conf ==> role-mysql.conf <== ## role-def:role-mysql ==> z-common-jp.conf <== #role-def:z-common ==> z-postfix.conf <== #role-def:z-postfix ࣮ࡍʹ͸role-def͸ෳ਺ߦͰ΋ߏΘͳ͍

Slide 15

Slide 15 text

x-Role ͷར༻ྫ • disk؂ࢹͷᮢ஋ΛRoleͰ੾Γସ͑Δ • (ӡ༻ͰՏഅʹར༻)

Slide 16

Slide 16 text

Service/Roleઃܭ & Deploy • Roleͷprefixʹҙຯɻconf಺ʹRole໊Λॻ͍ͯࣗಈͰઃఆ • systemdʹͳΔͱಈ͔ͳ͍ͱ͍͏આ • conf͸AnsibleͰ഑෍ • playbookʹͲͷconfΛ഑Δͷ͔هड़ • plugin΋ಉ͡Ansible roleͰ؅ཧ͠ɺconfͱͱ΋ʹ഑Δ

Slide 17

Slide 17 text

MackerelʹΑΔ؂ࢹͱPlugins

Slide 18

Slide 18 text

؂ࢹʹ·ͭΘΔ਺ࣈ • ؂ࢹϧʔϧ਺: 265 • Hostຖͷ؂ࢹϧʔϧ਺ • MySQL: 34 • Application: 39 • Search: 36 • Custom Plugin: 50+ (check + metrics + utils)

Slide 19

Slide 19 text

z-common-jp Ͱߦ͏؂ࢹ • unbound ͷϓϩηε؂ࢹ • unbound Λ࢖໊ͬͨલղܾ • crond ͷϓϩηε؂ࢹ • sshd ͷϙʔτ؂ࢹ • /etc/passwd ϑΝΠϧͷมߋ؂ࢹ • Global IPͱiptable • unameͷมߋ؂ࢹ • hostnameͷมߋ؂ࢹ • uptime؂ࢹ • inode؂ࢹ • ϝϞϦΤϥʔ • HW-RAID؂ࢹ • [metrics] NTP • [metrics] Linux Lite (CPU, Load avg, Process, Memory) • [ඪ४] File System • [ඪ४] Swap

Slide 20

Slide 20 text

Custom PluginʹΑΔ؂ࢹ • unbound ͷϓϩηε؂ࢹ • unbound Λ࢖໊ͬͨલղܾ • resolv.confΛಡΜͰ໊લղܾ • crond ͷϓϩηε؂ࢹ • sshd ͷϙʔτ؂ࢹ • /etc/passwd ϑΝΠϧͷมߋ؂ࢹ • Global IPͱiptable • unameͷมߋ؂ࢹ • hostnameͷมߋ؂ࢹ • uptime؂ࢹ • inode؂ࢹ • ϝϞϦΤϥʔ • HW-RAID؂ࢹ • [metrics] NTP • [metrics] Linux Lite (CPU, Load avg, Process, Memory) • [ඪ४] File System • [ඪ४] Swap

Slide 21

Slide 21 text

check_resolver • resolv.conf ΛಠࣗʹಡΈࠐΜͰ໊લղܾ͢Δ • resolv.conf ͕ॻ͖׵Θ໊ͬͯલղܾʹࣦഊ͢ΔࣄނΛ๷͙ • @kazuho͞Μͷ Net::DNS::Lite Λར༻(pure-perlͳDNSΫϥΠΞϯτ) $ /etc/mackerel-agent/commands/check-resolver --host alive.local -w 2 -c 2 OK: elapsed_time 0.000888 sec (alive.local IN A 10.0.0.1)

Slide 22

Slide 22 text

diff-detector • ίϚϯυ݁ՌͷมԽ͕͋ΔͱΞϥʔτ • `cat /etc/passwd`ɺ`uname -a`ɺ`hostname` Λݟ͍ͯΔ • https://github.com/kazeburo/diff-detector $ diff-detector -- date NG: detect difference: ```@@ -1 +1 @@ -Tue May 10 08:11:42 UTC 2016 +Tue May 10 08:11:43 UTC 2016```

Slide 23

Slide 23 text

check-iptables • ͘͞Βͷઐ༻αʔό͸શͯglobal ipΛ࣋ͭɻෆඞཁͳαʔό͸disableʹͯ͠ӡ༻ • global ip͕༗ޮ: ip6?tables_filter͕load͞Εͯͳ͚Ε͹Ξϥʔτ • global ip͕ແޮ: ip6?tables_filter͕load͞Ε͍ͯΔͱΞϥʔτ • ෆ༻ҙͳ iptables --list Ͱiptables_filter͕ಡΈࠐ·ΕɺύϑΥʔϚϯεʹӨڹ͢ΔͷΛൃݟ $ /etc/mackerel-agent/commands/check-iptables OK: does not have global-ip and iptables(iptable_filter) is disabled

Slide 24

Slide 24 text

check-iptables #!/bin/sh set -e if ( ip addr | grep 'inet ' | fgrep -v 'inet 127.0.0.1' | grep -v -E '^ *inet (10\.| 192\.168|10\.|172\.1[6789]\.|172\.2[0-9]\.|172\.3[01]\.)' > /dev/null 2>&1 ); then if ( lsmod |egrep 'ip6?table_filter' > /dev/null 2>&1 ); then echo "OK: have global-ip and iptables(iptable_filter) is enabled" exit 0 else echo "NG: have global-ip. but iptables(iptable_filter) is disabled" exit 2 fi else ... fi

Slide 25

Slide 25 text

check-uptime • ෆҙͳ࠶ىಈΛݕ஌ • ᮢ஋͸2෼-10ඵɻ1ճΞϥʔτ͕དྷ͙ͯ͢ʹ෮چ͢Δ • MySQL΍memcachedͰ΋ߦͳ͍ͬͯΔ $ /etc/mackerel-agent/commands/check-uptime -w 110 -c 110 OK: up 59 days, 8:21

Slide 26

Slide 26 text

check-inode • inode ރׇ๷ࢭ • OSɺRoleʹΑͬͯpartitionͷαΠζ͕एׯҟͳΔɻׂ߹Ͱ؂ࢹ͠ɺҰͭͰ ΋ᮢ஋Λ্ճΔͱΞϥʔτ • mackerel-plugin-inode ͸؂ࢹର৅ϝτϦοΫʹ wildcard ͕࢖͑ͳ͍ $ /etc/mackerel-agent/commands/check-inode -w 80 -c 90 OK: /:1%, /dev:1%, /dev/shm:1%, /run:1%, /sys/fs/cgroup:1%, /boot:1%, /run/ user/1037:1%

Slide 27

Slide 27 text

check-machine-exceptions • ϝϞϦҟৗΛݕ஌ͨ͠ࡍͷϩάΛ؂ࢹ • ݕग़ޙ͸ϋʔυ΢ΣΞͷอकΛґཔ͢Δ $ /etc/mackerel-agent/commands/check-machine-exceptions OK: No Machine Check Exception found

Slide 28

Slide 28 text

check-machine-exceptions ... sbridge: HANDLING MCE MEMORY ERROR CPU 0: Machine Check Exception: 0 Bank 8: cc0427c000010090 TSC 0 ADDR 37805ac0 MISC 45048ce86 PROCESSOR 0:406f1 TIME 1495654896 SOCKET 0 APIC 0 [Hardware Error]: Machine check events logged EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#0_Ha#0_Channel#0_DIMM": 4255 Unknown error(s): memory read on FATAL area OVERFLOW: cpu=0 Err=0001:0090 (ch=0), addr = 0x37805ac0 => socket=0, ha=1, Channel=0(mask=1), rank=0 ...

Slide 29

Slide 29 text

check-raid-disk (MegaRAID) • MegaCLI Λ࢖͍֤෺ཧDiskͷঢ়ଶΛ؂ࢹ Spun UpͰͳ͚Ε͹Ξϥʔτ • ͘͞Βͷઐ༻αʔόͷStorage͸ඞͣRAIDߏ੒Ͱఏڙ͞ΕɺGlobalଆͷ͘͞Βͷઐ༻ αʔό͔ΒSNMPͰ؂ࢹ͞Ε͍ͯΔ • GlobalΛด͍ͯͨ͡ΓɺSNMPΛfilter͍ͯ͠Δͱ؂ࢹ͞Εͳ͍ɻࣗલͰ؂ࢹ͠໰୊͕͋Ε͹อकΛґཔ • SSD͸յΕͨ͜ͱͳ͍ $ /etc/mackerel-agent/commands/check-raid-disk Firmware state: Online, Spun Up Firmware state: Online, Spun Up

Slide 30

Slide 30 text

mackerel-plugin-ntpq • offset(ઈର஋)ͱϦϞʔτͱͷSyncঢ়گͷՄࢹԽ • Sync < 0.1ɺoffset > 300 ͰΞϥʔτ • ঃʑʹ͕࣌ؒͣΕ͍ͯ͘αʔό͕͋Γɺntp.confͷௐ੔ʹՄࢹԽ͕ศར

Slide 31

Slide 31 text

mackerel-plugin-linux-lite • 2Core͔Β56Core·Ͱαʔό͕͋Δ • Ұ؏ͨ͠؂ࢹᮢ஋Λઃ͚΍͍͢Α͏ʹՄࢹԽ • ۭ͖ϝϞϦ • 100% ্ݶͷCPU • ίΞ͋ͨΓͷϩʔυΞϕϨʔδ • ϓϩηε਺

Slide 32

Slide 32 text

z-commonҎ֎Ͱར༻͢Δplugin

Slide 33

Slide 33 text

✂ periodic-checker • ಛఆͷ࣌ؒͷΈ؂ࢹΛߦ͏ • Daily ϝϯςφϯεͳͲ΁ͷରԠ • mackerel-agentͰཉ͍͠ͳʔ $ periodic-checker --range 00:00-06:00,06:30-24:00 \ -- check-tcp -p 80

Slide 34

Slide 34 text

check-dns-rr • consul+DNSͰՔಇ͍ͯ͠ΔαʔϏεͷ؂ࢹ • check-resolverಉ༷ɺಠࣗʹresolv.confΛಡΈࠐΉ • ᮢ஋ҎԼͷ୆਺ʹͳΔͱΞϥʔτ $ /etc/mackerel-agent/commands/check-dns-rr \
 --host=production.app.service.dc.consul -w 2 -c 2 OK: 3 dns-rr hosts found

Slide 35

Slide 35 text

check-spf-and-reserve-lookup • ϝʔϧ഑৴ʹͯར༻ • ֘౰IP͕SPFϨίʔυʹؚ·Ε͍ͯΔ͔ɺٯҾ͖ͨ࣌͠ʹυϝΠϯؚ͕·Ε ͍ͯΔ͔Λ֬ೝ $ /usr/local/bin/check-spf-and-reserve-lookup 192.168.1.1 mercari.jp NG: spf check failed: result=SoftFail NG: reverse lookup dig failed: no result

Slide 36

Slide 36 text

check-spf-and-reserve-lookup-all • αʔό͕͍࣋ͬͯΔGlobal IPશͯ֬ೝ #!/bin/bash set -e for ip in $(ip addr | grep 'inet ' | fgrep -v 'inet 127.0.0.1' | grep -v -E '^ *inet (10\.|192\.168|10\.|172\.1[6789]\.|172\.2[0-9]\.|172\.3[01]\.)' | sort | awk '{print $2}' | awk -F / '{print $1}') do /usr/local/bin/check-spf-and-reserve-lookup $ip mercari.jp done echo "OK: ALL"

Slide 37

Slide 37 text

check-mysql-slave-sql-error • ʮϨϓϦέʔγϣϯ͕ࢭ·ͬͨ࣌ʹɺͦͷཧ༝΋௨஌ͯ͘͠ΕΔͱศརʯ Ͱ࡞ͬͨplugin • Multi Source ReplicationରԠ $ /usr/local/bin/check-mysql-slave-sql-error --user=monitor --password=xxx mysql-slave-sql-error - MySQL slave SQL error CRITICAL: Last_SQL_Error found: Error 'Table 'tmp_replication_stop' already exists' on query. Default database: 'mercari'. Query: 'CREATE TABLE tmp_replication_stop ...

Slide 38

Slide 38 text

check-mysql-msr • MySQLͷMulti Source Replicationͷ؂ࢹ • 1ͭͰ΋ࢭ·͍ͬͯͨΓɺᮢ஋ΑΓ஗Ԇ͍ͯͨ͠ΒΞϥʔτ $ /usr/local/bin/check-mysql-msr --host=127.0.0.1 --port=3306 -- user=monitor --password=xxx -w 1 -c 1 MySQL Multi Source Replication OK: [O]
 admin-db=io:Yes,sql:Yes,behind:0
 main-db=io:Yes,sql:Yes,behind:0
 web-db=io:Yes,sql:Yes,behind:0 (࣮ࡍ͸1ߦ)

Slide 39

Slide 39 text

mackerel-plugin-msr • check-mysql-msrͷՄࢹԽ • mysql-msr.behind.* ʹରͯ͠؂ࢹ͕ߦ͑Ε͹check-mysql-msr͸ඞཁͳ͍...ʁ

Slide 40

Slide 40 text

Open Source! https://github.com/kazeburo/ ଍Γͳ͍΋ͷɺݟ͍ͨ΋ͷ͕͋Ε͹ڭ͍͑ͯͩ͘͞ https://github.com/kazeburo/custom-mackerel-plugins

Slide 41

Slide 41 text

ͦͷଞͷऔΓ૊Έ

Slide 42

Slide 42 text

໰͍߹Θͤ਺ͷ؂ࢹ • ଟ͘ͷਓ͕ࢀՃ͢ΔChannel΁௨஌ • ো֐ͷݕ஌ɺӨڹൣғͷ೺Ѳ • SRE͚ͩͰ͸ͳ͘ɺશνʔϜ͕ؔΘΓରԠͷਝ଎Խ

Slide 43

Slide 43 text

؂ࢹ͞Εͯͳ͍αʔόͷࣗಈநग़ • mackerl APIͱIPΞυϨεҰཡΛൺֱ͢Ε͹ग़ͤΔ • AWS/GCE Ͱ͸private ipͷҰཡ͸֤Ϋϥ΢υͷAPIΛ࢖͏͜ͱͰऔಘՄೳ • ͘͞Βઐ༻αʔό/͘͞ΒͷΫϥ΢υͰ͸औಘͰ͖ͳ͍ • private networkΛࣗલͰ؅ཧ͢Δͷ͕࢓༷

Slide 44

Slide 44 text

fping ʹΑΔૄ௨֬ೝ • fping͸ฒྻͯ͠ૄ௨֬ೝ͢ΔίϚϯυ • /24 ͷnetwork͝ͱʹfpingΛ࣮ߦ͠IPҰཡΛ࡞੒ • fpingΛ͞ΒʹฒྻԽ͢Δ͜ͱͰ࣌ؒ୹ॖ $ seq 0 255 | xargs -I{} -P 16 fping -q -a -r 2 -i 10 -g x.y.{}.0/24 YBSHT͸Πϝʔδ

Slide 45

Slide 45 text

Slack΁ͷ௨஌ • 1೔2ճslack΁౤ߘ • ؂ࢹ͞Εͯͳ͍αʔό • standbyɺpoweroff ͱͳ͍ͬͯΔαʔό • disk؂ࢹΛ؇Ί͍ͯΔαʔό • JP/US/UKͰ࣮ࢪ

Slide 46

Slide 46 text

·ͱΊ

Slide 47

Slide 47 text

ίʔυΛॻ͍ͯ໰୊Λղܾ͢Δ Mackerel͸࢖͍͕͍ͷ͋Δπʔϧ

Slide 48

Slide 48 text

We’re Hiring! ੈքʹ௅ΉɺϝϧΧϦ ଟ૚ଟॏଟ໘ΠϯϑϥετϥΫνϟɾ৴པੑͰࢧ͑ΔSRE www.mercari.com/jp/jobs/