Upgrade to Pro — share decks privately, control downloads, hide ads and more …

3 Popular Ops Tools in Japan

3 Popular Ops Tools in Japan

In Japan, there are a lot of large scale web services that process >3 billions PV per day. For keeping such big services stable, Japanese engineers developed useful tools for web operation. Although these tools are popular in Japan, I think ops engineers in other countries do not know about that. In this talk, I will introduce such useful tools with use cases as many as possible.

Takumi Sakamoto

June 18, 2013
Tweet

More Decks by Takumi Sakamoto

Other Decks in Programming

Transcript

  1. Background •  There are a lot of large scale web

    services in Japan ü  > tens of millions users, > thousands of servers e.g. mobage by DeNA •  social game platform •  3.5 billions PV/day •  thousands of servers
  2. Test-Driven Infrastructure If services are provisioned by programs, then we

    can - and should - test those programs. Further, we can write them in a test-driven style: start with the description of how the service should behave, and write a test that expresses this. Then write code that makes that test pass. '' " http://bitfieldconsulting.com/review-test-driven-infrastructure-chef by John Arundel
  3. serverspec RSpec Tests for Provisioned Servers 1 cat spec/www.example.jp/httpd_spec.rb 2

    describe package('httpd') do 3 it { should be_installed } 4 end 5 describe service('httpd') do 6 it { should be_running } 7 end 8 9 describe file('/etc/httpd/conf/httpd.conf') do 10 it { should be_file } 11 it { should contain 'ServerName www.example.jp' } 12 end 13 14 rake spec 15 /usr/bin/ruby -S rspec spec/www.example.jp/httpd_spec.rb 16 ...... 17 Finished in 0.99715 seconds 18 6 examples, 0 failures  
  4. A Lot of Built in Matchers should be_file should be_direcotry

    should be_linked_to ... file should be_resolvable should be_resolbavle.by(host) should be_resolbavle.by(dns) host should have_rule(-P INPUT ACCEPT) should be_resolbavle.by(host) should be_resolbavle.by(dns) iptables And More ... should be_installed should be_installed.by(gem) should be_installed.with_version(1.0) package should be_enabled should be_running should be_running.under('supervisor') service should exist should be_belong_to_group (wheel) should be_uid(1105) user
  5. Use Case •  Migrating to/Introducing new provisioning framework ü  write

    serverspec for defining how the server should be •  independent test of provisioning framework ü  write cookbook/manifest/shell script •  serverspec assures that new PF builds the server correctly
  6. Why serverspec? •  Light-Weight ü  start to use without any

    provisioning framework ü  easy to use & understand the behavior ü  no required agent (using ssh & shell command) •  Independent from Any Provisioning Frameworks ü  flexible, avoiding lock in ...
  7. Visualizing Metrics Imagine designing a car without any of the

    dials or indicators in front of the driver. Now paint the windshield. That's what it's like to run a web operation without metrics. '' " Web Operations: Keeping the Data On Time by John Allspaw
  8. GrowthForecast Application Metrics Visualizer 1 $ crontab -l 2 */5

    * * * * curl -F number=`mysql -BN -e 'select count(*) from member' game` 3 http://example.com/api/socialgame/member/register 2>&1 >/tmp/post.log You can visualize data in MySQL with the following one-liner
  9. •  Visualizing everything via Web API ü  include layered graphs

    Use Case POST /api/service1/web/2xx_count POST /api/service1/web/3xx_count POST /api/service1/web/4xx_count POST /api/service1/web/5xx_count Growth Forecast all_http_status_count Create Layered Graph 3xx_count 2xx_count 4xx_count 5xx_count
  10. Why GrowthForecast? •  Easy to install ü  2 yum command

    & 1 cpanm command (on RHEL6) •  Update metrics & get graphs by HTTP ü  easy to integrate with any systems •  Regular data size (∵ RRDTool) ü  free from disk usage planning
  11. Log Management Consider a user beyond a single line of

    access log. We are not sure whether a user becomes happy by a 200 response code line. However, we are sure that a user becomes unhappy by a 500 response code line. '' " http://ihara2525.tumblr.com/post/17029509298 by Masahiro Ihara (translated by me)
  12. Fluentd Pluggable Log Management Tool •  Enables you to build

    your own log solutions easily ü  structured logging (treat log as JSON) ü  pluggable architecture input buffer output File Tail HTTP dstat File Memory File Amazon S3 HDFS GrowthForecast
  13. Use Case Server Server Server Worker Amazon S3 hourly Archive

    GrowthForecast Visualizing Server Server Server MongoDB Searching recent logs forward access logs
  14. <match app*.apache.*> type forward <server> name worker host 192.168.1.3 port

    24224 </server> </match> forward to worker Server (input logs) <source> type tail path /var/log/httpd-access.log pos_file /var/log/access.log.pos tag app1.access format apache2 </source> input from log file <source> type tail path /var/log/httpd-access.log pos_file /var/log/access.log.pos tag app2.access format apache2 </source>
  15. <match app*.apache.access> type s3 s3_bucket apache_access s3_endpoint us-west-1 path logs/

    </match> <match app*.apache.access> type mongo host fluentd port 27017 database apache collection access </match> <match app*.apache.access> type datacounter count_interval 60 aggregate tag output_per_tag yes tag_prefix datacounter count_key status pattern1 2xx ^2\d\d$ pattern2 3xx ^3\d\d$ pattern4 4xx ^4\d\d$ pattern5 5xx ^5\d\d$ </match> <match datacounter.app*.apache.access.*> type growthforecast gfapi_url http://gf.com/api/ service service1:web tag_for section name_keys 2xx_count,3xx_count, 4xx_count,5xx_count </match> Worker (output logs) fluent-plugin-datacounter fluent-plugin-growthforecast <source> type forward port 24224 bind 0.0.0.0 </source> fluent-plugin-mongo fluent-plugin-S3
  16. Why Fluentd? •  Real Time Log Processing ü  you can

    know what’s going on. batch operation is too late. •  Extensibility ü  input, buffer, output are pluggable, a lot of plugins exist •  Reliable ü  HA configuration (re-send, failover)
  17. serverspec RSpec Tests for Provisioned Servers GrowthForecast Application Metrics Visualizer

    Fluentd Pluggable Log Management Tool 3 popular Ops Tools in Japan