メルカリにおける、継続的なアプリケーション改善を支える技術 / Continuous improvement applications and Mercari SRE #retty_tech_cafe

メルカリにおける、継続的なアプリケーション改善を支える技術 / Continuous improvement applications and Mercari SRE #retty_tech_cafe

メルカリにおける、継続的なアプリケーション改善を支える技術 - Retty Tech Cafe #5 2016/03/12

700669515ee872152d8b9403c2a0cf8c?s=128

kazeburo

March 12, 2016
Tweet

Transcript

  1. ϝϧΧϦʹ͓͚Δɺ ܧଓతͳΞϓϦέʔγϣϯվળΛ ࢧ͑Δٕज़ Retty Tech Cafe #5 Masahiro Nagano a.k.a.

    kazeburo
  2. Me • Masahiro Nagano • @kazeburo • Mercari, Inc.
 Principal

    Engineer, Site Reliability Engineering Team • BASE, Inc. Technical Advisor • ϑΫΦΧRubyେ৆ GMOϖύϘ৆
  3. SRE@Mercari γεςϜͷ໰୊Λൃݟɾղܾ͠ɺαʔϏεͷ৴པੑΛ޲্ͤ͞Δ αʔϏεΛεέʔϧͤ͞Δϛυϧ΢ΣΞͷ։ൃɾӡ༻

  4. Agenda • Mercari & Infrastructure • 10+ Deploys per Day

    • σϓϩΠͷස౓Λ͋͛ͯվળͷ଎౓Λ্͛Δ • Monitoring • αʔϏεͷ৴པੑͷ޲্
  5. Mercari Your Friendly Mobile MarketPlace JP US

  6. Mercari KPI DOWNLOAD GMV LISTED ITEMS 3200ສDL(JP+US) ݄ؒ100ԯԁҎ্ 1೔਺ेສ඼Ҏ্

  7. Traffic 1,200,000 reqs/min (API/HTTPS)

  8. Infrastructure ͘͞ΒΠϯλʔωοτ ੴङDC ઐ༻αʔόͱΫϥ΢υ Amazon Web Service US West (Oregon)

    Region JP US Akamai, Google BigQuery, mackerel Amazon Route53, Amazon S3, Amazon CloudFront
  9. Architecture nginx nginx nginx ©2011 Amazon Web Services LLC or

    its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached JP US nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia C d Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific DNS-RR App App App App App App MySQL MySQL memcached memcached EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 RDS RDS SPDY/HTTP2 SPDY/HTTP2
  10. Architecture nginx nginx nginx ©2011 Amazon Web Services LLC or

    its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached JP US nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia C d Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific DNS-RR App App App App App App MySQL MySQL memcached memcached EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 RDS RDS SPDY/HTTP2 SPDY/HTTP2 ϕϯμʔϩοΫΠϯΛආ͚Δ ͲΜͳIaaSɺDCʹ΋ల։Մೳ UK, Asia, etc
  11. Softwares nginx/OpenResty, Apache PHP 5.6, Go, Node, Python, Perl MySQL

    5.6, Q4M, memcached widebullet, Solr, Gaurun, fluentd Norikra, Kibana, kurado Consul, unbound WeɹOSS ❤ Stay tune https://github.com/mercari/
  12. 10+ Deploys per Day without downtime and 50x errors

  13. Applications Deployment of

  14. JUST COPYING FILES ࠶ىಈ͍Βͣ

  15. JUST COPYING FILES ? ࠶ىಈ͍Βͣ

  16. Deployment of PHP apps rsync time index.php required_1.php required_2.php required_4.php

    NEW require “required_1.php” require “required_2.php” require “required_3.php” require “required_4.php”NEW ient Multimedia Corporate data center Traditional server Mobile Client Add-on Example: IAM Add-on ssignment/ Task Requester Workers ient Multimedia Corporate data center Traditional server Mobile Client Add-on Example: IAM Add-on ssignment/ Task Requester Workers ient Multimedia Corporate data center Traditional server Mobile Client ssignment/ Task Requester Workers ssignment/ Task Requester Workers 200 OK 500 ISE 500 ISE 200 OK
  17. Zero downtime deployment • Blue-Green Deployment • Symlink Swapping Deployment

    • Request Pausing Deployment • Most Cloud specific. It’s required 2x servers. • (Maybe) Slow • pixiv: WEB+DB PRESS Vol.84 • Etsy: codeascraft.com/2013/07/01/atomic-deploys-at-etsy/ • ˚ Complex opcache/apc operation • Mercari: combinate with ngx_dynamic_upstream
  18. ngx_dynamic_upstream • github.com/cubicdaiya/ngx_dynamic_upstream • nginx extension for operating upstreams dynamically

    • cubicdaiya • Principal Engineer at Mercari, Inc. • SRE
  19. ngx_dynamic_upstream upstream backend { zone backend_zone 1m; server 127.0.0.1:6001; server

    127.0.0.1:6002; server 127.0.0.1:6003; } server { listen 6000; location /dynamic { allow 127.0.0.1; deny all; dynamic_upstream; } location / { proxy_pass http://backend; } } $ curl “127.0.0.1:6000/dynamic? upstream=backend_zone& server=127.0.0.1:6003& down=" $ curl “127.0.0.1:6000/dynamic? upstream=backend_zone& server=127.0.0.1:6003& up=" DOWN UP
  20. Zero downtime deployment rsync time index.php required_1.php required_2.php required_4.php NEW

    require “required_1.php” require “required_2.php” require “required_3.php” require “required_4.php”NEW LLC or its affiliates. All rights reserved. Multimedia Corporate data center Traditional server Mobile Client Example: IAM Add-on Requester Workers Requester Workers 200 OK 200 OK nginx + ngx_dynamic_upstream Multimedia Corporate data center Traditional server Mobile Client Example: IAM Add-on Requester Workers Multimedia Corporate data center Traditional server Mobile Client Requester Workers 200 OK 200 OK ผupstream΁ ผupstream΁ UP DOWN
  21. nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible nginx Zero down time deployment
  22. nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible rcync --rsync-path=/path/to/warpper nginx Zero down time deployment
  23. nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rcync --rsync-path=/path/to/warpper nginx Zero down time deployment
  24. nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rcync --rsync-path=/path/to/warpper down down down nginx Zero down time deployment
  25. nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rcync --rsync-path=/path/to/warpper rsync nginx Zero down time deployment
  26. nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rcync --rsync-path=/path/to/warpper rsync up up up nginx Zero down time deployment
  27. nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rsync nginx Zero down time deployment
  28. "

  29. nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App

    rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible nginx Who kicks Ansible? rsync
  30. One day last year ΠϯϑϥΤϯδχΞ͕ϗϫΠτϘʔυΛݟͯ Թ͔Έ͕ײ͡ΒΕΔΑ͏AnsibleΛىಈ͍ͯͨ͠

  31. Now Google Calendar + ChatOps

  32. Inside ChatOps @bot: yes check PullReq check Ticket exec Ansible

    merge PR git clone deploy application git clone preprocessing LGTM? ReleaseOK?
  33. 10+ Deploys per Day Zero down time deployment ChatOps +

    Calendar σϓϩΠͷස౓# ܧଓతͳΞϓϦέʔγϣϯվળͷجૅ
  34. Monitoring $

  35. Monitoring • Log monitoring • fluentd, Norikra, Mackerel • Agent

    based monitoring • Mackerel, NewRelic, Kurado
  36. Batch App App App App Log analysis system access_log application_log

    app_error_log error_log php_log... Log AWS S Check to make sure you recent set of AWS Simple This version was last upda (v1.4) Find the most recen aws.amazon.com/architect Usage Guidelines DEC 01 BigQuery Kibana Log Viewer cep AWS Check to make sure y recent set of AWS Sim This version was last u (v1.4) Find the most re aws.amazon.com/arch Always use Icon labe always include a label b the group in Arial. The Usage Guidelines DEC 01 Mackerel A Check to recent se This vers (v1.4) Fin aws.ama Always u always in the group Usage Guidel DEC 01 Slack Stream Processing Worker Worker
  37. Norikra is Open Source Software for “Stream Processing” by SQL

  38. Norikra log result mackerel Visualize, Alert Notification Application Filtering, Aggregation

    Summarize
  39. Norikra + mackerel SELECT COUNT(1, status like "5%") AS count_5xx,

    COUNT(1, status like "4%") AS count_4xx, COUNT(1, status like "3%") AS count_3xx, COUNT(1, status like "2%") AS count_2xx FROM mercari_access_log.win:time_batch(1 min) mackerel
  40. Norikra + mackerel SELECT avg(ptime) AS ptime_avg, percentiles(ptime, {90,95,98,99}) AS

    percentile FROM mercari_access_log.win:time_batch(1 min) Alert if 95%tile > n msec mackerel
  41. Norikra + slack SELECT "[" || hostname || "] ```"

    || message || "```" FROM error_mercari_api.win:time_batch(1 min) WHERE message like "%PHP Fatal error%" GROUP BY hostname, message HAVING COUNT(*) > 0
  42. Agent based monitoring

  43. Mackerel New Relic Kurado Alert Service Metrics Graph Server Metrics

    PHP Apps Metrics
  44. mackerel Worker Batch App App MySQL cron mackerel-agent fluent-plugin-mackerel mkr

    ᮢ஋ͷઃఆ Metricsऩू ௨஌ Metrics & Alert Hub
  45. mackerel custom plugins my-ec2-tag[go], jmx-get[go], diff-detector[go], delay-checker, interval-checker, periodic-checker, check-mysql-uptime[go],

    check-memcached-uptime[go], check-conntrack-free, check-crt-expiration, check-dns-rr, check-hydra-pos, check-inode, check-iptables, check-myip, check-solr-replication, check-solr-update, check-uptime, check-mysql-msr[go], mackerel-plugin-accelmail-counter, mackerel-plugin-gaurun-usage, mackerel-plugin-linux-lite, mackerel-plugin-msr[go] mackerel-plugin-ntpq, mackerel-plugin-search, mackerel-plugin-postfix mackerel-plugin-php-and-accesslog, mackerel-plugin-php-version Wrote 25+ custom plugins and utilitiy commands for monitoring Mercari infrustructure
  46. Kurado • github.com/kazeburo/Kurado • RRDTool based server metrics tool •

    Author: Me • ෳ਺୆αʔό΍2ͭͷ࣌ܥྻάϥϑΛͳΒ΂ͯද ࣔՄೳɻαʔόؒͷؔ܎΍௕ظؒͷτϨϯυΛ ௥͍΍͍͢
  47. Kurado Able to display 100+ graphs at once. Because it’s

    PNG.
  48. Monitoring $ Log based Monitoring Agent based Monitoring αʔόʗαʔϏεΛ༷ʑͳ֯౓͔Β؂ࢹ ΠϯϑϥʗαʔϏεͷ৴པੑ޲্

  49. ·ͱΊ

  50. ·ͱΊ • ܧଓతͳΞϓϦέʔγϣϯͷվળΛࢧ͑Δٕज़ • 10+ Deploys per Day • Safe

    Deploy with ngx_dynamic_upstream • Google Calendar + ChatOps • Monitoring • Log Monitoring • Agentd based Monitoring
  51. ·ͱΊ(࠶ܝ) SRE@Mercari γεςϜͷ໰୊Λൃݟɾղܾ͠ɺαʔϏεͷ৴པੑΛ޲্ αʔϏεΛεέʔϧͤ͞Δϛυϧ΢ΣΞͷ։ൃɾӡ༻

  52. We’re Hiring!! ΑΖ͓͘͠ئ͍͠·͢ʂ