メルカリにおける、継続的なアプリケーション改善を支える技術 / Continuous improvement applications and Mercari SRE #retty_tech_cafe

メルカリにおける、継続的なアプリケーション改善を支える技術 / Continuous improvement applications and Mercari SRE #retty_tech_cafe

メルカリにおける、継続的なアプリケーション改善を支える技術 - Retty Tech Cafe #5 2016/03/12

700669515ee872152d8b9403c2a0cf8c?s=128

kazeburo

March 12, 2016
Tweet

Transcript

  1. 2.

    Me • Masahiro Nagano • @kazeburo • Mercari, Inc.
 Principal

    Engineer, Site Reliability Engineering Team • BASE, Inc. Technical Advisor • ϑΫΦΧRubyେ৆ GMOϖύϘ৆
  2. 4.

    Agenda • Mercari & Infrastructure • 10+ Deploys per Day

    • σϓϩΠͷස౓Λ͋͛ͯվળͷ଎౓Λ্͛Δ • Monitoring • αʔϏεͷ৴པੑͷ޲্
  3. 8.

    Infrastructure ͘͞ΒΠϯλʔωοτ ੴङDC ઐ༻αʔόͱΫϥ΢υ Amazon Web Service US West (Oregon)

    Region JP US Akamai, Google BigQuery, mackerel Amazon Route53, Amazon S3, Amazon CloudFront
  4. 9.

    Architecture nginx nginx nginx ©2011 Amazon Web Services LLC or

    its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached JP US nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia C d Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific DNS-RR App App App App App App MySQL MySQL memcached memcached EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 RDS RDS SPDY/HTTP2 SPDY/HTTP2
  5. 10.

    Architecture nginx nginx nginx ©2011 Amazon Web Services LLC or

    its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached JP US nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia C d Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific DNS-RR App App App App App App MySQL MySQL memcached memcached EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 RDS RDS SPDY/HTTP2 SPDY/HTTP2 ϕϯμʔϩοΫΠϯΛආ͚Δ ͲΜͳIaaSɺDCʹ΋ల։Մೳ UK, Asia, etc
  6. 11.

    Softwares nginx/OpenResty, Apache PHP 5.6, Go, Node, Python, Perl MySQL

    5.6, Q4M, memcached widebullet, Solr, Gaurun, fluentd Norikra, Kibana, kurado Consul, unbound WeɹOSS ❤ Stay tune https://github.com/mercari/
  7. 16.

    Deployment of PHP apps rsync time index.php required_1.php required_2.php required_4.php

    NEW require “required_1.php” require “required_2.php” require “required_3.php” require “required_4.php”NEW ient Multimedia Corporate data center Traditional server Mobile Client Add-on Example: IAM Add-on ssignment/ Task Requester Workers ient Multimedia Corporate data center Traditional server Mobile Client Add-on Example: IAM Add-on ssignment/ Task Requester Workers ient Multimedia Corporate data center Traditional server Mobile Client ssignment/ Task Requester Workers ssignment/ Task Requester Workers 200 OK 500 ISE 500 ISE 200 OK
  8. 17.

    Zero downtime deployment • Blue-Green Deployment • Symlink Swapping Deployment

    • Request Pausing Deployment • Most Cloud specific. It’s required 2x servers. • (Maybe) Slow • pixiv: WEB+DB PRESS Vol.84 • Etsy: codeascraft.com/2013/07/01/atomic-deploys-at-etsy/ • ˚ Complex opcache/apc operation • Mercari: combinate with ngx_dynamic_upstream
  9. 19.

    ngx_dynamic_upstream upstream backend { zone backend_zone 1m; server 127.0.0.1:6001; server

    127.0.0.1:6002; server 127.0.0.1:6003; } server { listen 6000; location /dynamic { allow 127.0.0.1; deny all; dynamic_upstream; } location / { proxy_pass http://backend; } } $ curl “127.0.0.1:6000/dynamic? upstream=backend_zone& server=127.0.0.1:6003& down=" $ curl “127.0.0.1:6000/dynamic? upstream=backend_zone& server=127.0.0.1:6003& up=" DOWN UP
  10. 20.

    Zero downtime deployment rsync time index.php required_1.php required_2.php required_4.php NEW

    require “required_1.php” require “required_2.php” require “required_3.php” require “required_4.php”NEW LLC or its affiliates. All rights reserved. Multimedia Corporate data center Traditional server Mobile Client Example: IAM Add-on Requester Workers Requester Workers 200 OK 200 OK nginx + ngx_dynamic_upstream Multimedia Corporate data center Traditional server Mobile Client Example: IAM Add-on Requester Workers Multimedia Corporate data center Traditional server Mobile Client Requester Workers 200 OK 200 OK ผupstream΁ ผupstream΁ UP DOWN
  11. 21.

    nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible nginx Zero down time deployment
  12. 22.

    nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible rcync --rsync-path=/path/to/warpper nginx Zero down time deployment
  13. 23.

    nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rcync --rsync-path=/path/to/warpper nginx Zero down time deployment
  14. 24.

    nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rcync --rsync-path=/path/to/warpper down down down nginx Zero down time deployment
  15. 25.

    nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rcync --rsync-path=/path/to/warpper rsync nginx Zero down time deployment
  16. 26.

    nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rcync --rsync-path=/path/to/warpper rsync up up up nginx Zero down time deployment
  17. 27.

    nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rsync nginx Zero down time deployment
  18. 28.

    "

  19. 29.

    nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible nginx Who kicks Ansible? rsync
  20. 32.

    Inside ChatOps @bot: yes check PullReq check Ticket exec Ansible

    merge PR git clone deploy application git clone preprocessing LGTM? ReleaseOK?
  21. 33.

    10+ Deploys per Day Zero down time deployment ChatOps +

    Calendar σϓϩΠͷස౓# ܧଓతͳΞϓϦέʔγϣϯվળͷجૅ
  22. 35.

    Monitoring • Log monitoring • fluentd, Norikra, Mackerel • Agent

    based monitoring • Mackerel, NewRelic, Kurado
  23. 36.

    Batch App App App App Log analysis system access_log application_log

    app_error_log error_log php_log... Log AWS S Check to make sure you recent set of AWS Simple This version was last upda (v1.4) Find the most recen aws.amazon.com/architect Usage Guidelines DEC 01 BigQuery Kibana Log Viewer cep AWS Check to make sure y recent set of AWS Sim This version was last u (v1.4) Find the most re aws.amazon.com/arch Always use Icon labe always include a label b the group in Arial. The Usage Guidelines DEC 01 Mackerel A Check to recent se This vers (v1.4) Fin aws.ama Always u always in the group Usage Guidel DEC 01 Slack Stream Processing Worker Worker
  24. 39.

    Norikra + mackerel SELECT COUNT(1, status like "5%") AS count_5xx,

    COUNT(1, status like "4%") AS count_4xx, COUNT(1, status like "3%") AS count_3xx, COUNT(1, status like "2%") AS count_2xx FROM mercari_access_log.win:time_batch(1 min) mackerel
  25. 40.

    Norikra + mackerel SELECT avg(ptime) AS ptime_avg, percentiles(ptime, {90,95,98,99}) AS

    percentile FROM mercari_access_log.win:time_batch(1 min) Alert if 95%tile > n msec mackerel
  26. 41.

    Norikra + slack SELECT "[" || hostname || "] ```"

    || message || "```" FROM error_mercari_api.win:time_batch(1 min) WHERE message like "%PHP Fatal error%" GROUP BY hostname, message HAVING COUNT(*) > 0
  27. 44.

    mackerel Worker Batch App App MySQL cron mackerel-agent fluent-plugin-mackerel mkr

    ᮢ஋ͷઃఆ Metricsऩू ௨஌ Metrics & Alert Hub
  28. 45.

    mackerel custom plugins my-ec2-tag[go], jmx-get[go], diff-detector[go], delay-checker, interval-checker, periodic-checker, check-mysql-uptime[go],

    check-memcached-uptime[go], check-conntrack-free, check-crt-expiration, check-dns-rr, check-hydra-pos, check-inode, check-iptables, check-myip, check-solr-replication, check-solr-update, check-uptime, check-mysql-msr[go], mackerel-plugin-accelmail-counter, mackerel-plugin-gaurun-usage, mackerel-plugin-linux-lite, mackerel-plugin-msr[go] mackerel-plugin-ntpq, mackerel-plugin-search, mackerel-plugin-postfix mackerel-plugin-php-and-accesslog, mackerel-plugin-php-version Wrote 25+ custom plugins and utilitiy commands for monitoring Mercari infrustructure
  29. 46.

    Kurado • github.com/kazeburo/Kurado • RRDTool based server metrics tool •

    Author: Me • ෳ਺୆αʔό΍2ͭͷ࣌ܥྻάϥϑΛͳΒ΂ͯද ࣔՄೳɻαʔόؒͷؔ܎΍௕ظؒͷτϨϯυΛ ௥͍΍͍͢
  30. 49.
  31. 50.

    ·ͱΊ • ܧଓతͳΞϓϦέʔγϣϯͷվળΛࢧ͑Δٕज़ • 10+ Deploys per Day • Safe

    Deploy with ngx_dynamic_upstream • Google Calendar + ChatOps • Monitoring • Log Monitoring • Agentd based Monitoring