Slide 1

Slide 1 text

ϝϧΧϦʹ͓͚Δɺ ܧଓతͳΞϓϦέʔγϣϯվળΛ ࢧ͑Δٕज़ Retty Tech Cafe #5 Masahiro Nagano a.k.a. kazeburo

Slide 2

Slide 2 text

Me • Masahiro Nagano • @kazeburo • Mercari, Inc.
 Principal Engineer, Site Reliability Engineering Team • BASE, Inc. Technical Advisor • ϑΫΦΧRubyେ৆ GMOϖύϘ৆

Slide 3

Slide 3 text

SRE@Mercari γεςϜͷ໰୊Λൃݟɾղܾ͠ɺαʔϏεͷ৴པੑΛ޲্ͤ͞Δ αʔϏεΛεέʔϧͤ͞Δϛυϧ΢ΣΞͷ։ൃɾӡ༻

Slide 4

Slide 4 text

Agenda • Mercari & Infrastructure • 10+ Deploys per Day • σϓϩΠͷස౓Λ͋͛ͯվળͷ଎౓Λ্͛Δ • Monitoring • αʔϏεͷ৴པੑͷ޲্

Slide 5

Slide 5 text

Mercari Your Friendly Mobile MarketPlace JP US

Slide 6

Slide 6 text

Mercari KPI DOWNLOAD GMV LISTED ITEMS 3200ສDL(JP+US) ݄ؒ100ԯԁҎ্ 1೔਺ेສ඼Ҏ্

Slide 7

Slide 7 text

Traffic 1,200,000 reqs/min (API/HTTPS)

Slide 8

Slide 8 text

Infrastructure ͘͞ΒΠϯλʔωοτ ੴङDC ઐ༻αʔόͱΫϥ΢υ Amazon Web Service US West (Oregon) Region JP US Akamai, Google BigQuery, mackerel Amazon Route53, Amazon S3, Amazon CloudFront

Slide 9

Slide 9 text

Architecture nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached JP US nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia C d Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific DNS-RR App App App App App App MySQL MySQL memcached memcached EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 RDS RDS SPDY/HTTP2 SPDY/HTTP2

Slide 10

Slide 10 text

Architecture nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached JP US nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia C d Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific DNS-RR App App App App App App MySQL MySQL memcached memcached EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 RDS RDS SPDY/HTTP2 SPDY/HTTP2 ϕϯμʔϩοΫΠϯΛආ͚Δ ͲΜͳIaaSɺDCʹ΋ల։Մೳ UK, Asia, etc

Slide 11

Slide 11 text

Softwares nginx/OpenResty, Apache PHP 5.6, Go, Node, Python, Perl MySQL 5.6, Q4M, memcached widebullet, Solr, Gaurun, fluentd Norikra, Kibana, kurado Consul, unbound WeɹOSS ❤ Stay tune https://github.com/mercari/

Slide 12

Slide 12 text

10+ Deploys per Day without downtime and 50x errors

Slide 13

Slide 13 text

Applications Deployment of

Slide 14

Slide 14 text

JUST COPYING FILES ࠶ىಈ͍Βͣ

Slide 15

Slide 15 text

JUST COPYING FILES ? ࠶ىಈ͍Βͣ

Slide 16

Slide 16 text

Deployment of PHP apps rsync time index.php required_1.php required_2.php required_4.php NEW require “required_1.php” require “required_2.php” require “required_3.php” require “required_4.php”NEW ient Multimedia Corporate data center Traditional server Mobile Client Add-on Example: IAM Add-on ssignment/ Task Requester Workers ient Multimedia Corporate data center Traditional server Mobile Client Add-on Example: IAM Add-on ssignment/ Task Requester Workers ient Multimedia Corporate data center Traditional server Mobile Client ssignment/ Task Requester Workers ssignment/ Task Requester Workers 200 OK 500 ISE 500 ISE 200 OK

Slide 17

Slide 17 text

Zero downtime deployment • Blue-Green Deployment • Symlink Swapping Deployment • Request Pausing Deployment • Most Cloud specific. It’s required 2x servers. • (Maybe) Slow • pixiv: WEB+DB PRESS Vol.84 • Etsy: codeascraft.com/2013/07/01/atomic-deploys-at-etsy/ • ˚ Complex opcache/apc operation • Mercari: combinate with ngx_dynamic_upstream

Slide 18

Slide 18 text

ngx_dynamic_upstream • github.com/cubicdaiya/ngx_dynamic_upstream • nginx extension for operating upstreams dynamically • cubicdaiya • Principal Engineer at Mercari, Inc. • SRE

Slide 19

Slide 19 text

ngx_dynamic_upstream upstream backend { zone backend_zone 1m; server 127.0.0.1:6001; server 127.0.0.1:6002; server 127.0.0.1:6003; } server { listen 6000; location /dynamic { allow 127.0.0.1; deny all; dynamic_upstream; } location / { proxy_pass http://backend; } } $ curl “127.0.0.1:6000/dynamic? upstream=backend_zone& server=127.0.0.1:6003& down=" $ curl “127.0.0.1:6000/dynamic? upstream=backend_zone& server=127.0.0.1:6003& up=" DOWN UP

Slide 20

Slide 20 text

Zero downtime deployment rsync time index.php required_1.php required_2.php required_4.php NEW require “required_1.php” require “required_2.php” require “required_3.php” require “required_4.php”NEW LLC or its affiliates. All rights reserved. Multimedia Corporate data center Traditional server Mobile Client Example: IAM Add-on Requester Workers Requester Workers 200 OK 200 OK nginx + ngx_dynamic_upstream Multimedia Corporate data center Traditional server Mobile Client Example: IAM Add-on Requester Workers Multimedia Corporate data center Traditional server Mobile Client Requester Workers 200 OK 200 OK ผupstream΁ ผupstream΁ UP DOWN

Slide 21

Slide 21 text

nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible nginx Zero down time deployment

Slide 22

Slide 22 text

nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible rcync --rsync-path=/path/to/warpper nginx Zero down time deployment

Slide 23

Slide 23 text

nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rcync --rsync-path=/path/to/warpper nginx Zero down time deployment

Slide 24

Slide 24 text

nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rcync --rsync-path=/path/to/warpper down down down nginx Zero down time deployment

Slide 25

Slide 25 text

nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rcync --rsync-path=/path/to/warpper rsync nginx Zero down time deployment

Slide 26

Slide 26 text

nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rcync --rsync-path=/path/to/warpper rsync up up up nginx Zero down time deployment

Slide 27

Slide 27 text

nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible #!/bin/sh mercari_app_ctl down rsync $* sleep 1 mercari_app_ctl up rsync nginx Zero down time deployment

Slide 28

Slide 28 text

"

Slide 29

Slide 29 text

nginx dynamic upstream App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper App rsync
 wrapper nginx dynamic upstream dynamic upstream deploy ansible nginx Who kicks Ansible? rsync

Slide 30

Slide 30 text

One day last year ΠϯϑϥΤϯδχΞ͕ϗϫΠτϘʔυΛݟͯ Թ͔Έ͕ײ͡ΒΕΔΑ͏AnsibleΛىಈ͍ͯͨ͠

Slide 31

Slide 31 text

Now Google Calendar + ChatOps

Slide 32

Slide 32 text

Inside ChatOps @bot: yes check PullReq check Ticket exec Ansible merge PR git clone deploy application git clone preprocessing LGTM? ReleaseOK?

Slide 33

Slide 33 text

10+ Deploys per Day Zero down time deployment ChatOps + Calendar σϓϩΠͷස౓# ܧଓతͳΞϓϦέʔγϣϯվળͷجૅ

Slide 34

Slide 34 text

Monitoring $

Slide 35

Slide 35 text

Monitoring • Log monitoring • fluentd, Norikra, Mackerel • Agent based monitoring • Mackerel, NewRelic, Kurado

Slide 36

Slide 36 text

Batch App App App App Log analysis system access_log application_log app_error_log error_log php_log... Log AWS S Check to make sure you recent set of AWS Simple This version was last upda (v1.4) Find the most recen aws.amazon.com/architect Usage Guidelines DEC 01 BigQuery Kibana Log Viewer cep AWS Check to make sure y recent set of AWS Sim This version was last u (v1.4) Find the most re aws.amazon.com/arch Always use Icon labe always include a label b the group in Arial. The Usage Guidelines DEC 01 Mackerel A Check to recent se This vers (v1.4) Fin aws.ama Always u always in the group Usage Guidel DEC 01 Slack Stream Processing Worker Worker

Slide 37

Slide 37 text

Norikra is Open Source Software for “Stream Processing” by SQL

Slide 38

Slide 38 text

Norikra log result mackerel Visualize, Alert Notification Application Filtering, Aggregation Summarize

Slide 39

Slide 39 text

Norikra + mackerel SELECT COUNT(1, status like "5%") AS count_5xx, COUNT(1, status like "4%") AS count_4xx, COUNT(1, status like "3%") AS count_3xx, COUNT(1, status like "2%") AS count_2xx FROM mercari_access_log.win:time_batch(1 min) mackerel

Slide 40

Slide 40 text

Norikra + mackerel SELECT avg(ptime) AS ptime_avg, percentiles(ptime, {90,95,98,99}) AS percentile FROM mercari_access_log.win:time_batch(1 min) Alert if 95%tile > n msec mackerel

Slide 41

Slide 41 text

Norikra + slack SELECT "[" || hostname || "] ```" || message || "```" FROM error_mercari_api.win:time_batch(1 min) WHERE message like "%PHP Fatal error%" GROUP BY hostname, message HAVING COUNT(*) > 0

Slide 42

Slide 42 text

Agent based monitoring

Slide 43

Slide 43 text

Mackerel New Relic Kurado Alert Service Metrics Graph Server Metrics PHP Apps Metrics

Slide 44

Slide 44 text

mackerel Worker Batch App App MySQL cron mackerel-agent fluent-plugin-mackerel mkr ᮢ஋ͷઃఆ Metricsऩू ௨஌ Metrics & Alert Hub

Slide 45

Slide 45 text

mackerel custom plugins my-ec2-tag[go], jmx-get[go], diff-detector[go], delay-checker, interval-checker, periodic-checker, check-mysql-uptime[go], check-memcached-uptime[go], check-conntrack-free, check-crt-expiration, check-dns-rr, check-hydra-pos, check-inode, check-iptables, check-myip, check-solr-replication, check-solr-update, check-uptime, check-mysql-msr[go], mackerel-plugin-accelmail-counter, mackerel-plugin-gaurun-usage, mackerel-plugin-linux-lite, mackerel-plugin-msr[go] mackerel-plugin-ntpq, mackerel-plugin-search, mackerel-plugin-postfix mackerel-plugin-php-and-accesslog, mackerel-plugin-php-version Wrote 25+ custom plugins and utilitiy commands for monitoring Mercari infrustructure

Slide 46

Slide 46 text

Kurado • github.com/kazeburo/Kurado • RRDTool based server metrics tool • Author: Me • ෳ਺୆αʔό΍2ͭͷ࣌ܥྻάϥϑΛͳΒ΂ͯද ࣔՄೳɻαʔόؒͷؔ܎΍௕ظؒͷτϨϯυΛ ௥͍΍͍͢

Slide 47

Slide 47 text

Kurado Able to display 100+ graphs at once. Because it’s PNG.

Slide 48

Slide 48 text

Monitoring $ Log based Monitoring Agent based Monitoring αʔόʗαʔϏεΛ༷ʑͳ֯౓͔Β؂ࢹ ΠϯϑϥʗαʔϏεͷ৴པੑ޲্

Slide 49

Slide 49 text

·ͱΊ

Slide 50

Slide 50 text

·ͱΊ • ܧଓతͳΞϓϦέʔγϣϯͷվળΛࢧ͑Δٕज़ • 10+ Deploys per Day • Safe Deploy with ngx_dynamic_upstream • Google Calendar + ChatOps • Monitoring • Log Monitoring • Agentd based Monitoring

Slide 51

Slide 51 text

·ͱΊ(࠶ܝ) SRE@Mercari γεςϜͷ໰୊Λൃݟɾղܾ͠ɺαʔϏεͷ৴པੑΛ޲্ αʔϏεΛεέʔϧͤ͞Δϛυϧ΢ΣΞͷ։ൃɾӡ༻

Slide 52

Slide 52 text

We’re Hiring!! ΑΖ͓͘͠ئ͍͠·͢ʂ