Slide 1

Slide 1 text

ϝϧΧϦͷγεςϜɾαʔϏε ؂ࢹʹ͍ͭͯ Monitoring Seminar in Mercari 2017/Good/Meat @kazeburo

Slide 2

Slide 2 text

Me • Masahiro Nagano / ௕໺խ޿ • id:kazeburo • Mercari, Inc
 Principal Engineer
 Site Reliability Engineering (SRE) Team

Slide 3

Slide 3 text

Agenda • Mercariͷ͜Ε·Ͱͱ؂ࢹπʔϧ • MackerelͰͷαʔό؂ࢹ

Slide 4

Slide 4 text

~ 2014/9 JP ੴङ ΠϯϑϥνʔϜ!5PLZP

Slide 5

Slide 5 text

~2014/9 • ͘͞ΒΠϯλʔωοτੴङDCͷઐ༻αʔόͱΫϥ΢υΛར༻ • ઐ༻αʔόʹͯZabbixαʔόΛߏங • ʮτϦΨʔʯΛ׆༻ͯ͠͞·͟·ͳ؂ࢹΛߦ͏ • ৚͍݅ࣜͩ͠Ͱෳࡶͳ؂ࢹ͕࣮ݱ • ݱࡏͷ؂ࢹ߲໨ͷϕʔε͕Ͱ͖Δ

Slide 6

Slide 6 text

2014/9~ US JP ੴङ ΠϯϑϥνʔϜ!5PLZP

Slide 7

Slide 7 text

2014/9~ • USαʔϏε͕ AWS Oregon region ʹͯ։࢝ • ઐ༻αʔόͱΫϥ΢υͷϚϧνˍϋΠϒϦουߏ੒ • USʹ΋Zabbix ServerΛߏஙͯ͠ɺ౦ژ͔Β؂ࢹ

Slide 8

Slide 8 text

ଟRegion Zabbixͷ՝୊ • Zabbixͷઃఆ͕ͣΕ͍ͯ͘ • ӡ༻͍ͯ͠ΔZabbixͷόʔδϣϯ͕ҟͳΔ • ઐ༻αʔόͱAWSͰएׯҧ͏؂ࢹ߲໨ • JPͰ࡞ΓࠐΜͩ؂ࢹ͕USͰ࠶ݱͰ͖͍ͯͳ͍ • USͰ͚ͩى͖Δ؂ࢹ࿙ΕʹΑΔࣄނ • Zabbix ProxyΛར༻͠ɺ1ݸͷZabbix Server΁ू໿ͳͲͷҊ΋

Slide 9

Slide 9 text

Zabbixͷ՝୊ • Zabbix ࣗମͷӡ༻ • όʔδϣϯΞοϓͷෛ୲ • MySQL ͷෛՙ͕େ͖͘ɺ؂ࢹ஗ԆͳͲ΋ൃੜ • ෳࡶͳτϦΨʔͷ؅ཧ • Ϛ΢εΫϦοΫओମͷઃఆ • όʔδϣϯ؅ཧͳͲΛߦ͍͍ͨ • ؂ࢹͷ௨஌Λվળ͍ͨ͠

Slide 10

Slide 10 text

2016/1~ US JP ੴङ 43&!5PLZP

Slide 11

Slide 11 text

mackerel ಋೖ • Service Metrics͔Βಋೖ • ؆୯ʹάϥϑ͕ඳ͚ɺ؂ࢹᮢ஋ͷઃఆ͕Ͱ͖Δ • fluentdɺNorikraͱͷ૊Έ߹Θͤ • ZabbixͷτϦΨʔͷҠ২ • τϦΨʔΛPluginͱ࣮ͯ͠૷ • Plugin͸ GitͰ؅ཧ͠ɺAnsibleͰ഑෍

Slide 12

Slide 12 text

mackerel ಋೖ • ؂ࢹπʔϧɺ࣌ܥྻDBͷӡ༻ͷΦϑϩʔυ • Կ΋͠ͳͯ͘΋ຖिόʔδϣϯΞοϓ • JP/USͰͷ؂ࢹ߲໨Λ߹ΘͤΔ • ҟͳΔͱ͜Ζ͸ Ansible templateͳͲͰٵऩ

Slide 13

Slide 13 text

2017/3~ US JP ੴङ UK 43&!5PLZP

Slide 14

Slide 14 text

2017/3~ • UK ͰͷαʔϏε։࢝ • UK Λ։࢝͢Δʹ͋ͨͬͯɺ͞Βʹ΋͏ҰͭͷΫϥ΢υΛ࠾༻ • ؂ࢹ͕Ϋϥ΢υԽ͞Ε͓ͯΓɺ৽ͨͳ؂ࢹαʔόͷ௥Ճ͸ඞཁͳ͠ • JP/US ͷ؂ࢹ߲໨͕ͦͷ··ద༻Ͱ͖ɺΠϯϑϥετϥΫνϟͷߏங׬ྃͱͱ ΋ʹ؂ࢹͷઃఆ͕׬ྃ

Slide 15

Slide 15 text

ݱࡏ US JP ੴङ UK 43&&OHJOFFST!+1646, Stackdriver Prometheus

Slide 16

Slide 16 text

ݱࡏ • ϚΠΫϩαʔϏεԽ • GKE ্ͷίϯςφɾαʔϏεͷ؂ࢹͷͨΊʹ Stack DriverɺPrometheusɺ DataDog ͷ׆༻ • αʔόαΠυΤϯδχΞ΋؂ࢹπʔϧΛར༻

Slide 17

Slide 17 text

ͦͷଞͷ؂ࢹ New Relic Kurado άϥϑը૾ΛҰؾʹݟΕΔͷͰศར جຊతͳϝτϦΫε͸ͪ͜ΒͰݟΔ PHPͷ಺෦ͷτϨʔε ΞϓϦέʔγϣϯͷνϡʔχϯάͷࢀߟ

Slide 18

Slide 18 text

MackerelͰͷαʔό؂ࢹ

Slide 19

Slide 19 text

https://speakerdeck.com/kazeburo/mackerel-day

Slide 20

Slide 20 text

؂ࢹʹ·ͭΘΔ਺ࣈ • ؂ࢹϧʔϧ਺: 278 • Hostຖͷ؂ࢹϧʔϧ਺ • MySQL: 34 • Application: 39 • Search: 37 • Custom Plugin: 50+ (check + metrics + utils)

Slide 21

Slide 21 text

MySQLͷ؂ࢹ߲໨(1/4) • Connectivity • FileSystem % >85% >88% • Swap % >50% >70% • ssh-alive • sshdͷϓϩηε؂ࢹ • global-ip-and-iptables • global ipͷ༗ແͱiptablesͷঢ়ଶ • unbound-resolv • localͷunboudͰ໊લղܾ͕Ͱ͖Δ͔ • unbound-process • unboundͷϓϩηε؂ࢹ • crond-process • crondͷϓϩηε؂ࢹ • uptime • ࠶ىಈ؂ࢹ

Slide 22

Slide 22 text

MySQLͷ؂ࢹ߲໨(2/4) • inode-usage • inode࢖༻཰ >80% >90% • uname-change • unameίϚϯυͷ݁Ռͷdiff؂ࢹ • passwd-change • passwdϑΝΠϧͷdiff؂ࢹ • hostname-changed • hostnameίϚϯυͷ݁Ռͷdiff؂ࢹ • custom.ntpq.synced.remote <0.1 <0.1 • custom.ntpq.offset.seconds >300 >300 (msec) • ntpͷಉظαʔόͱ࣌ࠁͷζϨ • custom.linux-lite.memory.avail <50MB <20MB • ۭ͖ϝϞϦ • custom.linux-lite.cpu-usage.cpu-steal >20% >20% • custom.linux-lite.cpu-usage.cpu-iowait >30% >50% • custom.linux-lite-cpu-usage.cpu-system >8% >8% • ͦΕͧΕͷCPU࢖༻཰(100%্͕ݶ)

Slide 23

Slide 23 text

MySQLͷ؂ࢹ߲໨(3/4) • cutom.linux-lite.loadavg.per-cpu >3 >3 • ίΞ਺ͰׂͬͨϩʔυΞϕϨʔδ • postfix-smtp-alive • SMTPϙʔτͷ֬ೝ • postfix-master-process • postfix masterϓϩηε؂ࢹ • custom.postfix.mailq.queue >100 >5k • postfix mail Ωϡʔ଺ཹ਺ • custom.linux-lite.process.all >2k >2k • custom.linux-lite.process.running >60 >100 • શϓϩηε਺ͱ࣮ߦதͷϓϩηε਺ • mysql-uptime • mysqlͷuptime • custom.mysql-lite.replication-threads.io <0.2 <0.2 • custom.mysql-lite.replication-threads.sql <0.2 <0.2 • ϨϓϦέʔγϣϯͷ֤threadͷঢ়ଶ

Slide 24

Slide 24 text

MySQLͷ؂ࢹ߲໨(4/4) • custom.mysql-lite.replication-behind- master.second >5 >5 • mysqlͷϨϓϦέʔγϣϯ஗Ԇ • custom.mysql-lite.connections.utilization >90 >90 • max_connectionsʹର͢ΔίωΫγϣϯ਺ • custom.mysql-lite.threads.running >1k >2k • mysql্Ͱ࣮ߦதͷεϨου਺ • mysql-slave-sql-error • replicationΤϥʔͷ؂ࢹ • machine-exceptions • αʔόͷϝϞϦΤϥʔ؂ࢹ • raid-disks • αʔόͷRAID/Diskঢ়ଶͷ؂ࢹ

Slide 25

Slide 25 text

؂ࢹͷҭ͔ͯͨ • ʮ؂ࢹ͸ܧଓతͳςετʯby kazuho • ʮςετϑΝʔετʯͰγεςϜΛߏங͠ͳ͕Β؂ࢹΛ࡞Δ • ϓϩηε؂ࢹ΍ϙʔτ؂ࢹ • ʮϙετϞʔςϜʯͷରԠࡦͱͯ͠؂ࢹΛҭͯΔ • ᮢ஋ͷௐ੔ɺL7Ϩϕϧͷ؂ࢹ

Slide 26

Slide 26 text

end