Upgrade to Pro — share decks privately, control downloads, hide ads and more …

はてなのサーバ管理ツールの話

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

 はてなのサーバ管理ツールの話

はてなのサーバ管理ツールの思想とサーバメトリクス可視化システムの実装

Avatar for Yuuki Tsubouchi (yuuk1)

Yuuki Tsubouchi (yuuk1)

September 20, 2013
Tweet

More Decks by Yuuki Tsubouchi (yuuk1)

Other Decks in Programming

Transcript

  1. ϑϧελοΫπʔϧ Zabbix (ϗετ؅ཧ + ؂ࢹ) Crowbar (Ganglia + Chef +

    Nagios) ࢖͍͍ͨπʔϧͱ૊Έ߹Θͤʹ͍͘ ֤ػೳ͕ີ݁߹
  2. Mackerel ʲՄࢉ໊ࢺʳɹαό ʲෆՄࢉ໊ࢺʳαό(ͷ਎) Plack / Starlet / Teng / Router::Simple

    / Class::Accessor::Lite::Lazy / TheSchwartz / Text::Xslate / Config::ENV / Scope::Container / SQL::Maker ... Perl 5.14 / Carton 1.0
  3. ৑௕Խɾෛՙ෼ࢄ Role Ծ૝IP(VIP) LVS Role DNS ϥ΢ϯυϩϏϯ DNS DNSϥ΢ϯυϩϏϯ༻FQDNʹର ͯ͠ෳ਺ͷIP͔Βϥ΢ϯυϩϏ

    ϯͰҰͭʹIPΛฦ͢ LVSϗετʹԾ૝IPΛׂΓ౰ͯ Ծ૝IP͑͞Θ͔͍ͬͯΕ͹ɹɹ ͋ͱ͸উखʹ෼ࢄ
  4. SNMP OS͕ఏڙ͢ΔجຊతͳϝτϦΫεΛऔಘ ͦͷଞϧʔλ΍SquidͳͲ Net::SNMPΛ࢖͏ my ($session, $error) = Net::SNMP->session( -hostname

    => $hostname, -community => $community, -version => 2, -timeout => 10, -translate => 0x0, ); $session // croak "SNMP error: $error"; my $response = $session->get_request( -varbindlist => $mibs, # MIBΛෳ਺ࢦఆ ) || croak "SNMP error: $session->error";
  5. Nginx HttpStubStatusModule HTTPͰNginxݻ༗ͷϝτϦΫεΛͱΕΔ Active connections: 291 server accepts handled requests

    16630948 16630948 31070465 Reading: 6 Writing: 179 Waiting: 106 $ curl http://nginxhost/nginx_status:8080
  6. Plack Plack::Middleware::ServerStatus::Lite JSON format͕͋ΔͷͰָ % curl http://server:port/server-status?json {"Uptime":"1332476669","BusyWorkers":"2", "stats":[ {"protocol":null,"remote_addr":null,"pid":"78639",

    "status":".","method":null,"uri":null,"host":null,"ss":null}, {"protocol":"HTTP/ 1.1","remote_addr":"127.0.0.1","pid":"78640", "status":"A","method":"GET","uri":"/","host":"localhost: 10226","ss":0}, ... ],"IdleWorkers":"3"}
  7. Memcached TelnetΠϯλϑΣʔε stats STAT pid 14868 STAT uptime 175931 STAT

    time 1220540125 STAT version 1.2.2 ... STAT curr_connections 92 STAT total_connections 1740 STAT connection_structures 165 STAT cmd_get 7411 STAT cmd_set 28445156 STAT get_hits 5183 STAT get_misses 2228 STAT evictions 0 STAT bytes_read 2112768087 STAT bytes_written 1000038245 STAT limit_maxbytes 52428800 STAT threads 1 END my $sock = IO::Socket::INET->new( PeerAddr => "$hostname:$port", Proto => 'tcp', Timeout => 10, ) or croak "Couldn't connect to $hostname:$port"; $sock->print("stats\r\n"); my $value_by_stat = {}; while (my $line = $sock->getline) { last if $line =~ /^END/; $line =~ s/\n$|\r\n$//; #chomp if ($line =~ /^STAT\s+(\S*)\s+(.*)/) { $value_by_stat->{$1} = $2; } }
  8. Agent ֤JobΛAgentͱݺΜͰ͍Δ ϦϞʔτϗετ͔ΒϝτϦΫεΛͱͬͯ͘Δϓϩηε ରԠϛυϧ΢ΣΞɼϓϩτίϧ Apache, Nginx, MySQL, Munin, Latency, Plack,

    Perlbal, Redis, SNMP, Solr, TheSchwartz ϓϥΨϒϧ Hatena::Mackerel::Worker::Agent::XXX SNMP΍HTTP, TelnetͰϝτϦΫεऔಘ
  9. Job Queue DB RRDtool Host Host Host Host enqueue.pl cron

    δϣϒ౤ೖ ϗετ৘ใऔಘ ϝτϦΫεऔಘ
  10. Job Queue DB RRD Host Host Host Host enqueue.pl cron

    RRDtool App Server Browser γεςϜશମ
  11. rrdtool graph $ rrdtool graph --end now --start end-120000s --width

    400 \\ DEF:ds0a=/home/rrdtool/data/router1.rrd:ds0:AVERAGE \\ DEF:ds0b=/home/rrdtool/data/router1.rrd:ds0:AVERAGE:step=1800 \\ DEF:ds0c=/home/rrdtool/data/router1.rrd:ds0:AVERAGE:step=7200 \\ LINE1:ds0a#0000FF:"default resolution\l" \\ LINE1:ds0b#00CCFF:"resolution 1800 seconds per interval\l" \\ LINE1:ds0c#FF00FF:"resolution 7200 seconds per interval\l" άϥϑඳըظؒͷࢦఆ άϥϑ෯ࢦఆ σʔλϦιʔεʢRRDϑΝΠϧ໊ʣࢦఆ άϥϑઢͷଠ͞΍৭ɺຌྫΛࢦఆ ...
  12. my $fmt = <<'EOS'; graphformula = "[graph:" ( graphs [

    ":" option ] ) "]" graphs = graph | "(" graph ")" 0*( ",(" graph ")" ) graph = "path:" elements | instruction ":" [ name ] ":::" option ) elements = object : [ tag : [ label ] ] object = name tag = name label = name instruction = "def" | "cdef" | "vdef" | "line" digit | "area" | "hrule" | "vrule" | "print" | "gprint" | "commend" | "tick" | "shift" | "textalign" option = 1*( char | "," | "=" | "#" | "@" | ":" | " " | "\" | op) name = 1*char char = alphanum | mark op = "+" | "/" | "%" | "*" mark = "-" | "_" | "." | "!" | "~" | "*" | "'" alphanum = alpha | digit alpha = lowalpha | upalpha lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" EOS BNFͰਅ໘໨ʹ࢓༷Λఆٛ
  13. RRDͷσΟεΫI/Oෛՙ (ϗετ਺) x (ϝτϦΫεछผ਺) io per 5෼ ϗετ਺ = ਺ઍ

    ϝτϦΫεछผ਺ = 20Ҏ্ 10k io per 5෼ ࣮ࡍ͸5෼ͷ͏ͪͷ͋Δ࣌ؒʹI/O͕ूத
  14. rrdcached rrdtoolͷߋ৽΍ू໿ॲཧΛड͚෇͚ΔσʔϞϯ rrdͷߋ৽͸ಉ࣌͡ࠁʹࡴ౸͕ͪ͠ ॻ͖ࠐΈΛϥϯμϜͳ࣌ؒ෼஗Ԇͯ͘͠ΕΔ rrdcachedͳͩ͠ͱ, ฏۉ 700 iops όʔετ஋ 1200

    iops SSDͩͱͳΜͱ͔ͳΔ HDD(AWS)ͩͱݫ͍͠ ϦϦʔε൛(1.4.8)͸updateʹ͔͠ରԠ͍ͯ͠ͳ͍ ͕trunkͳΒ͹ͦͷଞͷ֤छαϒίϚϯυʹରԠ
  15. $ rrdtool graph ./dskUsed.png \ --start="now-1y" \ --end="now+2y" \ --imgformat=PNG

    \ --title="Disk Usage" \ --height=200 \ --width=500 \ --lower-limit=0 \ DEF:usage=testhost__dskUsed.rrd:value:MAX \ DEF:total=testhost__dskTotal.rrd:value:MAX \ CDEF:c_warn=total,0.85,* \ CDEF:c_crit=total,0.95,* \ VDEF:v_total=total,MINIMUM \ VDEF:v_warn=c_warn,MINIMUM \ VDEF:v_crit=c_crit,MINIMUM \ VDEF:v_usage_slope=usage,LSLSLOPE \ VDEF:v_usage_intercept=usage,LSLINT \ CDEF:c_usage_predict=usage,POP,v_usage_slope,COUNT,*,v_usage_intercept,+ \ CDEF:c_rwarn=c_usage_predict,v_warn,v_total,LIMIT \ VDEF:v_rwarn=c_rwarn,FIRST \ CDEF:c_rcrit=c_usage_predict,v_crit,v_total,LIMIT \ VDEF:v_rcrit=c_rcrit,FIRST \ HRULE:v_warn#FF8800:"warning":dashes=5 \ VRULE:v_rwarn#FF8800::dashes=5 \ HRULE:v_crit#FF4400:"critical" \ VRULE:v_rcrit#FF4400 \ HRULE:v_total#FF0000:"total" \ AREA:usage#00FF00:"Disk Usage" \ LINE1:c_usage_predict#0000FF:"Predict" \ GPRINT:v_rwarn:"Reach warning (85%)\: %c":strftime \ GPRINT:v_rcrit:"Reach critical (95%)\: %c":strftime shoichimasuhara++
  16. D3.js NVD3.js (Re-usable charts for d3.js) Rickshaw (time-series graph) rrdtool

    xport --json ͰJSONܗࣜͰ஋Λऔಘɹɹ Ͱ͖ΔͨΊɺετϨʔδ͸RRDͷ·· D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS.
  17. ୅ΘΓʹRedis΍MongoDBͳͲͷετϨʔδΛ࢖͏ FnordMetrics (http://fnordmetric.io/) GrowthForecastʹࣅͯΔ ετϨʔδɿRedis άϥϑɿRickshaw my $furl = Furl->new(timeout

    => 10); my $res = $furl->post( "http://fnordmetricshost:4242/events", ['Content-Type' => 'application/json'], encode_json({ "_type" => '_incr', "value" => scalar @$statuses, "gauge" => "tweets_per_minute" }), );