はてなのサーバ管理ツールの話

 はてなのサーバ管理ツールの話

はてなのサーバ管理ツールの思想とサーバメトリクス可視化システムの実装

A658ec7f1badf73819dfa501165016c1?s=128

Yuuki Tsubouchi (yuuk1)

September 20, 2013
Tweet

Transcript

  1. 20.
  2. 23.

    ϑϧελοΫπʔϧ Zabbix (ϗετ؅ཧ + ؂ࢹ) Crowbar (Ganglia + Chef +

    Nagios) ࢖͍͍ͨπʔϧͱ૊Έ߹Θͤʹ͍͘ ֤ػೳ͕ີ݁߹
  3. 26.

    Mackerel ʲՄࢉ໊ࢺʳɹαό ʲෆՄࢉ໊ࢺʳαό(ͷ਎) Plack / Starlet / Teng / Router::Simple

    / Class::Accessor::Lite::Lazy / TheSchwartz / Text::Xslate / Config::ENV / Scope::Container / SQL::Maker ... Perl 5.14 / Carton 1.0
  4. 40.
  5. 42.
  6. 44.

    ৑௕Խɾෛՙ෼ࢄ Role Ծ૝IP(VIP) LVS Role DNS ϥ΢ϯυϩϏϯ DNS DNSϥ΢ϯυϩϏϯ༻FQDNʹର ͯ͠ෳ਺ͷIP͔Βϥ΢ϯυϩϏ

    ϯͰҰͭʹIPΛฦ͢ LVSϗετʹԾ૝IPΛׂΓ౰ͯ Ծ૝IP͑͞Θ͔͍ͬͯΕ͹ɹɹ ͋ͱ͸উखʹ෼ࢄ
  7. 53.
  8. 58.
  9. 69.

    SNMP OS͕ఏڙ͢ΔجຊతͳϝτϦΫεΛऔಘ ͦͷଞϧʔλ΍SquidͳͲ Net::SNMPΛ࢖͏ my ($session, $error) = Net::SNMP->session( -hostname

    => $hostname, -community => $community, -version => 2, -timeout => 10, -translate => 0x0, ); $session // croak "SNMP error: $error"; my $response = $session->get_request( -varbindlist => $mibs, # MIBΛෳ਺ࢦఆ ) || croak "SNMP error: $session->error";
  10. 70.

    Nginx HttpStubStatusModule HTTPͰNginxݻ༗ͷϝτϦΫεΛͱΕΔ Active connections: 291 server accepts handled requests

    16630948 16630948 31070465 Reading: 6 Writing: 179 Waiting: 106 $ curl http://nginxhost/nginx_status:8080
  11. 71.

    Plack Plack::Middleware::ServerStatus::Lite JSON format͕͋ΔͷͰָ % curl http://server:port/server-status?json {"Uptime":"1332476669","BusyWorkers":"2", "stats":[ {"protocol":null,"remote_addr":null,"pid":"78639",

    "status":".","method":null,"uri":null,"host":null,"ss":null}, {"protocol":"HTTP/ 1.1","remote_addr":"127.0.0.1","pid":"78640", "status":"A","method":"GET","uri":"/","host":"localhost: 10226","ss":0}, ... ],"IdleWorkers":"3"}
  12. 72.

    Memcached TelnetΠϯλϑΣʔε stats STAT pid 14868 STAT uptime 175931 STAT

    time 1220540125 STAT version 1.2.2 ... STAT curr_connections 92 STAT total_connections 1740 STAT connection_structures 165 STAT cmd_get 7411 STAT cmd_set 28445156 STAT get_hits 5183 STAT get_misses 2228 STAT evictions 0 STAT bytes_read 2112768087 STAT bytes_written 1000038245 STAT limit_maxbytes 52428800 STAT threads 1 END my $sock = IO::Socket::INET->new( PeerAddr => "$hostname:$port", Proto => 'tcp', Timeout => 10, ) or croak "Couldn't connect to $hostname:$port"; $sock->print("stats\r\n"); my $value_by_stat = {}; while (my $line = $sock->getline) { last if $line =~ /^END/; $line =~ s/\n$|\r\n$//; #chomp if ($line =~ /^STAT\s+(\S*)\s+(.*)/) { $value_by_stat->{$1} = $2; } }
  13. 74.
  14. 78.

    Agent ֤JobΛAgentͱݺΜͰ͍Δ ϦϞʔτϗετ͔ΒϝτϦΫεΛͱͬͯ͘Δϓϩηε ରԠϛυϧ΢ΣΞɼϓϩτίϧ Apache, Nginx, MySQL, Munin, Latency, Plack,

    Perlbal, Redis, SNMP, Solr, TheSchwartz ϓϥΨϒϧ Hatena::Mackerel::Worker::Agent::XXX SNMP΍HTTP, TelnetͰϝτϦΫεऔಘ
  15. 86.
  16. 90.

    Job Queue DB RRDtool Host Host Host Host enqueue.pl cron

    δϣϒ౤ೖ ϗετ৘ใऔಘ ϝτϦΫεऔಘ
  17. 93.

    Job Queue DB RRD Host Host Host Host enqueue.pl cron

    RRDtool App Server Browser γεςϜશମ
  18. 96.
  19. 98.

    rrdtool graph $ rrdtool graph --end now --start end-120000s --width

    400 \\ DEF:ds0a=/home/rrdtool/data/router1.rrd:ds0:AVERAGE \\ DEF:ds0b=/home/rrdtool/data/router1.rrd:ds0:AVERAGE:step=1800 \\ DEF:ds0c=/home/rrdtool/data/router1.rrd:ds0:AVERAGE:step=7200 \\ LINE1:ds0a#0000FF:"default resolution\l" \\ LINE1:ds0b#00CCFF:"resolution 1800 seconds per interval\l" \\ LINE1:ds0c#FF00FF:"resolution 7200 seconds per interval\l" άϥϑඳըظؒͷࢦఆ άϥϑ෯ࢦఆ σʔλϦιʔεʢRRDϑΝΠϧ໊ʣࢦఆ άϥϑઢͷଠ͞΍৭ɺຌྫΛࢦఆ ...
  20. 102.

    my $fmt = <<'EOS'; graphformula = "[graph:" ( graphs [

    ":" option ] ) "]" graphs = graph | "(" graph ")" 0*( ",(" graph ")" ) graph = "path:" elements | instruction ":" [ name ] ":::" option ) elements = object : [ tag : [ label ] ] object = name tag = name label = name instruction = "def" | "cdef" | "vdef" | "line" digit | "area" | "hrule" | "vrule" | "print" | "gprint" | "commend" | "tick" | "shift" | "textalign" option = 1*( char | "," | "=" | "#" | "@" | ":" | " " | "\" | op) name = 1*char char = alphanum | mark op = "+" | "/" | "%" | "*" mark = "-" | "_" | "." | "!" | "~" | "*" | "'" alphanum = alpha | digit alpha = lowalpha | upalpha lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" EOS BNFͰਅ໘໨ʹ࢓༷Λఆٛ
  21. 104.

    RRDͷσΟεΫI/Oෛՙ (ϗετ਺) x (ϝτϦΫεछผ਺) io per 5෼ ϗετ਺ = ਺ઍ

    ϝτϦΫεछผ਺ = 20Ҏ্ 10k io per 5෼ ࣮ࡍ͸5෼ͷ͏ͪͷ͋Δ࣌ؒʹI/O͕ूத
  22. 106.

    rrdcached rrdtoolͷߋ৽΍ू໿ॲཧΛड͚෇͚ΔσʔϞϯ rrdͷߋ৽͸ಉ࣌͡ࠁʹࡴ౸͕ͪ͠ ॻ͖ࠐΈΛϥϯμϜͳ࣌ؒ෼஗Ԇͯ͘͠ΕΔ rrdcachedͳͩ͠ͱ, ฏۉ 700 iops όʔετ஋ 1200

    iops SSDͩͱͳΜͱ͔ͳΔ HDD(AWS)ͩͱݫ͍͠ ϦϦʔε൛(1.4.8)͸updateʹ͔͠ରԠ͍ͯ͠ͳ͍ ͕trunkͳΒ͹ͦͷଞͷ֤छαϒίϚϯυʹରԠ
  23. 109.
  24. 113.

    $ rrdtool graph ./dskUsed.png \ --start="now-1y" \ --end="now+2y" \ --imgformat=PNG

    \ --title="Disk Usage" \ --height=200 \ --width=500 \ --lower-limit=0 \ DEF:usage=testhost__dskUsed.rrd:value:MAX \ DEF:total=testhost__dskTotal.rrd:value:MAX \ CDEF:c_warn=total,0.85,* \ CDEF:c_crit=total,0.95,* \ VDEF:v_total=total,MINIMUM \ VDEF:v_warn=c_warn,MINIMUM \ VDEF:v_crit=c_crit,MINIMUM \ VDEF:v_usage_slope=usage,LSLSLOPE \ VDEF:v_usage_intercept=usage,LSLINT \ CDEF:c_usage_predict=usage,POP,v_usage_slope,COUNT,*,v_usage_intercept,+ \ CDEF:c_rwarn=c_usage_predict,v_warn,v_total,LIMIT \ VDEF:v_rwarn=c_rwarn,FIRST \ CDEF:c_rcrit=c_usage_predict,v_crit,v_total,LIMIT \ VDEF:v_rcrit=c_rcrit,FIRST \ HRULE:v_warn#FF8800:"warning":dashes=5 \ VRULE:v_rwarn#FF8800::dashes=5 \ HRULE:v_crit#FF4400:"critical" \ VRULE:v_rcrit#FF4400 \ HRULE:v_total#FF0000:"total" \ AREA:usage#00FF00:"Disk Usage" \ LINE1:c_usage_predict#0000FF:"Predict" \ GPRINT:v_rwarn:"Reach warning (85%)\: %c":strftime \ GPRINT:v_rcrit:"Reach critical (95%)\: %c":strftime shoichimasuhara++
  25. 116.

    D3.js NVD3.js (Re-usable charts for d3.js) Rickshaw (time-series graph) rrdtool

    xport --json ͰJSONܗࣜͰ஋Λऔಘɹɹ Ͱ͖ΔͨΊɺετϨʔδ͸RRDͷ·· D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS.
  26. 118.

    ୅ΘΓʹRedis΍MongoDBͳͲͷετϨʔδΛ࢖͏ FnordMetrics (http://fnordmetric.io/) GrowthForecastʹࣅͯΔ ετϨʔδɿRedis άϥϑɿRickshaw my $furl = Furl->new(timeout

    => 10); my $res = $furl->post( "http://fnordmetricshost:4242/events", ['Content-Type' => 'application/json'], encode_json({ "_type" => '_incr', "value" => scalar @$statuses, "gauge" => "tweets_per_minute" }), );
  27. 120.