Upgrade to Pro — share decks privately, control downloads, hide ads and more …

メルカリにおける、継続的なアプリケーション改善を支える技術 / Continuous improvement applications and Mercari SRE #retty_tech_cafe

メルカリにおける、継続的なアプリケーション改善を支える技術 / Continuous improvement applications and Mercari SRE #retty_tech_cafe

メルカリにおける、継続的なアプリケーション改善を支える技術 - Retty Tech Cafe #5 2016/03/12

kazeburo

March 12, 2016
Tweet

More Decks by kazeburo

Other Decks in Technology

Transcript

  1. ϝϧΧϦʹ͓͚Δɺ
    ܧଓతͳΞϓϦέʔγϣϯվળΛ
    ࢧ͑Δٕज़
    Retty Tech Cafe #5
    Masahiro Nagano a.k.a. kazeburo

    View Slide

  2. Me
    • Masahiro Nagano
    • @kazeburo
    • Mercari, Inc.

    Principal Engineer, Site Reliability Engineering Team
    • BASE, Inc. Technical Advisor
    • ϑΫΦΧRubyେ৆ GMOϖύϘ৆

    View Slide

  3. SRE@Mercari
    γεςϜͷ໰୊Λൃݟɾղܾ͠ɺαʔϏεͷ৴པੑΛ޲্ͤ͞Δ
    αʔϏεΛεέʔϧͤ͞Δϛυϧ΢ΣΞͷ։ൃɾӡ༻

    View Slide

  4. Agenda
    • Mercari & Infrastructure
    • 10+ Deploys per Day
    • σϓϩΠͷස౓Λ͋͛ͯվળͷ଎౓Λ্͛Δ
    • Monitoring
    • αʔϏεͷ৴པੑͷ޲্

    View Slide

  5. Mercari
    Your Friendly Mobile MarketPlace
    JP US

    View Slide

  6. Mercari KPI
    DOWNLOAD
    GMV
    LISTED ITEMS
    3200ສDL(JP+US)
    ݄ؒ100ԯԁҎ্
    1೔਺ेສ඼Ҏ্

    View Slide

  7. Traffic
    1,200,000 reqs/min
    (API/HTTPS)

    View Slide

  8. Infrastructure
    ͘͞ΒΠϯλʔωοτ ੴङDC
    ઐ༻αʔόͱΫϥ΢υ
    Amazon Web Service
    US West (Oregon) Region
    JP US
    Akamai, Google BigQuery, mackerel
    Amazon Route53, Amazon S3, Amazon CloudFront

    View Slide

  9. Architecture
    nginx nginx nginx
    ©2011 Amazon Web Services LLC or its affiliates. All rights reserved.
    Client Multimedia Corporate
    data center
    Traditional
    server
    Mobile Client
    IAM Add-on Example:
    IAM Add-on
    ence
    )
    Assignment/
    Task
    Requester
    Workers
    DNS-RR
    App App App
    App App App
    MySQL MySQL
    memcached
    memcached
    JP US
    nginx nginx nginx
    ©2011 Amazon Web Services LLC or its affiliates. All rights reserved.
    User Users Client Multimedia C
    d
    Mobile Client
    Internet AWS Management
    Console
    IAM Add-on Example:
    IAM Add-on
    Human Intelligence
    Tasks (HIT)
    Assignment/
    Task
    Requester
    Workers
    Amazon
    Mechanical Turk
    Non-Service Specific
    DNS-RR
    App App App
    App App App
    MySQL MySQL
    memcached
    memcached
    EC2
    EC2 EC2 EC2
    EC2
    EC2
    EC2
    EC2
    EC2
    EC2
    EC2
    RDS RDS
    SPDY/HTTP2
    SPDY/HTTP2

    View Slide

  10. Architecture
    nginx nginx nginx
    ©2011 Amazon Web Services LLC or its affiliates. All rights reserved.
    Client Multimedia Corporate
    data center
    Traditional
    server
    Mobile Client
    IAM Add-on Example:
    IAM Add-on
    ence
    )
    Assignment/
    Task
    Requester
    Workers
    DNS-RR
    App App App
    App App App
    MySQL MySQL
    memcached
    memcached
    JP US
    nginx nginx nginx
    ©2011 Amazon Web Services LLC or its affiliates. All rights reserved.
    User Users Client Multimedia C
    d
    Mobile Client
    Internet AWS Management
    Console
    IAM Add-on Example:
    IAM Add-on
    Human Intelligence
    Tasks (HIT)
    Assignment/
    Task
    Requester
    Workers
    Amazon
    Mechanical Turk
    Non-Service Specific
    DNS-RR
    App App App
    App App App
    MySQL MySQL
    memcached
    memcached
    EC2
    EC2 EC2 EC2
    EC2
    EC2
    EC2
    EC2
    EC2
    EC2
    EC2
    RDS RDS
    SPDY/HTTP2
    SPDY/HTTP2
    ϕϯμʔϩοΫΠϯΛආ͚Δ
    ͲΜͳIaaSɺDCʹ΋ల։Մೳ
    UK, Asia, etc

    View Slide

  11. Softwares
    nginx/OpenResty, Apache
    PHP 5.6, Go, Node, Python, Perl
    MySQL 5.6, Q4M, memcached
    widebullet, Solr, Gaurun, fluentd
    Norikra, Kibana, kurado
    Consul, unbound
    WeɹOSS

    Stay tune https://github.com/mercari/

    View Slide

  12. 10+ Deploys per Day
    without downtime and 50x errors

    View Slide

  13. Applications
    Deployment of

    View Slide

  14. JUST
    COPYING
    FILES
    ࠶ىಈ͍Βͣ

    View Slide

  15. JUST
    COPYING
    FILES ?
    ࠶ىಈ͍Βͣ

    View Slide

  16. Deployment of PHP apps
    rsync
    time
    index.php
    required_1.php
    required_2.php
    required_4.php NEW
    require “required_1.php”
    require “required_2.php”
    require “required_3.php”
    require “required_4.php”NEW
    ient Multimedia Corporate
    data center
    Traditional
    server
    Mobile Client
    Add-on Example:
    IAM Add-on
    ssignment/
    Task
    Requester
    Workers
    ient Multimedia Corporate
    data center
    Traditional
    server
    Mobile Client
    Add-on Example:
    IAM Add-on
    ssignment/
    Task
    Requester
    Workers
    ient Multimedia Corporate
    data center
    Traditional
    server
    Mobile Client
    ssignment/
    Task
    Requester
    Workers
    ssignment/
    Task
    Requester
    Workers
    200 OK
    500 ISE
    500 ISE
    200 OK

    View Slide

  17. Zero downtime deployment
    • Blue-Green Deployment
    • Symlink Swapping Deployment
    • Request Pausing Deployment
    • Most Cloud specific. It’s required 2x servers.
    • (Maybe) Slow
    • pixiv: WEB+DB PRESS Vol.84
    • Etsy: codeascraft.com/2013/07/01/atomic-deploys-at-etsy/
    • ˚ Complex opcache/apc operation
    • Mercari: combinate with ngx_dynamic_upstream

    View Slide

  18. ngx_dynamic_upstream
    • github.com/cubicdaiya/ngx_dynamic_upstream
    • nginx extension for operating upstreams
    dynamically
    • cubicdaiya
    • Principal Engineer at Mercari, Inc.
    • SRE

    View Slide

  19. ngx_dynamic_upstream
    upstream backend {
    zone backend_zone 1m;
    server 127.0.0.1:6001;
    server 127.0.0.1:6002;
    server 127.0.0.1:6003;
    }
    server {
    listen 6000;
    location /dynamic {
    allow 127.0.0.1;
    deny all;
    dynamic_upstream;
    }
    location / {
    proxy_pass http://backend;
    }
    }
    $ curl “127.0.0.1:6000/dynamic?
    upstream=backend_zone&
    server=127.0.0.1:6003&
    down="
    $ curl “127.0.0.1:6000/dynamic?
    upstream=backend_zone&
    server=127.0.0.1:6003&
    up="
    DOWN
    UP

    View Slide

  20. Zero downtime deployment
    rsync
    time
    index.php
    required_1.php
    required_2.php
    required_4.php NEW
    require “required_1.php”
    require “required_2.php”
    require “required_3.php”
    require “required_4.php”NEW
    LLC or its affiliates. All rights reserved.
    Multimedia Corporate
    data center
    Traditional
    server
    Mobile Client
    Example:
    IAM Add-on
    Requester
    Workers
    Requester
    Workers
    200 OK
    200 OK
    nginx + ngx_dynamic_upstream
    Multimedia Corporate
    data center
    Traditional
    server
    Mobile Client
    Example:
    IAM Add-on
    Requester
    Workers
    Multimedia Corporate
    data center
    Traditional
    server
    Mobile Client
    Requester
    Workers
    200 OK
    200 OK
    ผupstream΁
    ผupstream΁ UP
    DOWN

    View Slide

  21. nginx
    dynamic
    upstream
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    nginx
    dynamic
    upstream
    dynamic
    upstream
    deploy ansible
    nginx
    Zero down time deployment

    View Slide

  22. nginx
    dynamic
    upstream
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    nginx
    dynamic
    upstream
    dynamic
    upstream
    deploy ansible
    rcync --rsync-path=/path/to/warpper
    nginx
    Zero down time deployment

    View Slide

  23. nginx
    dynamic
    upstream
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    nginx
    dynamic
    upstream
    dynamic
    upstream
    deploy ansible
    #!/bin/sh
    mercari_app_ctl down
    rsync $*
    sleep 1
    mercari_app_ctl up
    rcync --rsync-path=/path/to/warpper
    nginx
    Zero down time deployment

    View Slide

  24. nginx
    dynamic
    upstream
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    nginx
    dynamic
    upstream
    dynamic
    upstream
    deploy ansible
    #!/bin/sh
    mercari_app_ctl down
    rsync $*
    sleep 1
    mercari_app_ctl up
    rcync --rsync-path=/path/to/warpper
    down
    down
    down
    nginx
    Zero down time deployment

    View Slide

  25. nginx
    dynamic
    upstream
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    nginx
    dynamic
    upstream
    dynamic
    upstream
    deploy ansible
    #!/bin/sh
    mercari_app_ctl down
    rsync $*
    sleep 1
    mercari_app_ctl up
    rcync --rsync-path=/path/to/warpper
    rsync
    nginx
    Zero down time deployment

    View Slide

  26. nginx
    dynamic
    upstream
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    nginx
    dynamic
    upstream
    dynamic
    upstream
    deploy ansible
    #!/bin/sh
    mercari_app_ctl down
    rsync $*
    sleep 1
    mercari_app_ctl up
    rcync --rsync-path=/path/to/warpper
    rsync
    up
    up
    up
    nginx
    Zero down time deployment

    View Slide

  27. nginx
    dynamic
    upstream
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    nginx
    dynamic
    upstream
    dynamic
    upstream
    deploy ansible
    #!/bin/sh
    mercari_app_ctl down
    rsync $*
    sleep 1
    mercari_app_ctl up
    rsync
    nginx
    Zero down time deployment

    View Slide

  28. "

    View Slide

  29. nginx
    dynamic
    upstream
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    App rsync

    wrapper
    nginx
    dynamic
    upstream
    dynamic
    upstream
    deploy ansible
    nginx
    Who kicks
    Ansible?
    rsync

    View Slide

  30. One day last year
    ΠϯϑϥΤϯδχΞ͕ϗϫΠτϘʔυΛݟͯ
    Թ͔Έ͕ײ͡ΒΕΔΑ͏AnsibleΛىಈ͍ͯͨ͠

    View Slide

  31. Now
    Google Calendar + ChatOps

    View Slide

  32. Inside ChatOps
    @bot: yes
    check PullReq
    check Ticket
    exec Ansible
    merge PR
    git clone
    deploy application
    git clone preprocessing
    LGTM?
    ReleaseOK?

    View Slide

  33. 10+ Deploys per Day
    Zero down time deployment
    ChatOps + Calendar
    σϓϩΠͷස౓#
    ܧଓతͳΞϓϦέʔγϣϯվળͷجૅ

    View Slide

  34. Monitoring
    $

    View Slide

  35. Monitoring
    • Log monitoring
    • fluentd, Norikra, Mackerel
    • Agent based monitoring
    • Mackerel, NewRelic, Kurado

    View Slide

  36. Batch
    App App
    App App
    Log analysis system
    access_log
    application_log
    app_error_log
    error_log
    php_log...
    Log
    AWS S
    Check to make sure you
    recent set of AWS Simple
    This version was last upda
    (v1.4) Find the most recen
    aws.amazon.com/architect
    Usage Guidelines
    DEC
    01
    BigQuery
    Kibana
    Log Viewer
    cep
    AWS
    Check to make sure y
    recent set of AWS Sim
    This version was last u
    (v1.4) Find the most re
    aws.amazon.com/arch
    Always use Icon labe
    always include a label b
    the group in Arial. The
    Usage Guidelines
    DEC
    01
    Mackerel
    A
    Check to
    recent se
    This vers
    (v1.4) Fin
    aws.ama
    Always u
    always in
    the group
    Usage Guidel
    DEC
    01
    Slack
    Stream Processing
    Worker
    Worker

    View Slide

  37. Norikra is Open Source Software for
    “Stream Processing” by SQL

    View Slide

  38. Norikra
    log result
    mackerel
    Visualize, Alert
    Notification
    Application
    Filtering, Aggregation
    Summarize

    View Slide

  39. Norikra + mackerel
    SELECT
    COUNT(1, status like "5%") AS count_5xx,
    COUNT(1, status like "4%") AS count_4xx,
    COUNT(1, status like "3%") AS count_3xx,
    COUNT(1, status like "2%") AS count_2xx
    FROM mercari_access_log.win:time_batch(1 min)
    mackerel

    View Slide

  40. Norikra + mackerel
    SELECT
    avg(ptime) AS ptime_avg,
    percentiles(ptime, {90,95,98,99}) AS percentile
    FROM
    mercari_access_log.win:time_batch(1 min)
    Alert if
    95%tile > n msec
    mackerel

    View Slide

  41. Norikra + slack
    SELECT
    "[" || hostname || "] ```" || message || "```"
    FROM error_mercari_api.win:time_batch(1 min)
    WHERE message like "%PHP Fatal error%"
    GROUP BY hostname, message
    HAVING COUNT(*) > 0

    View Slide

  42. Agent based monitoring

    View Slide

  43. Mackerel New Relic
    Kurado
    Alert
    Service Metrics
    Graph
    Server Metrics
    PHP
    Apps Metrics

    View Slide

  44. mackerel
    Worker Batch
    App App
    MySQL cron
    mackerel-agent
    fluent-plugin-mackerel
    mkr
    ᮢ஋ͷઃఆ
    Metricsऩू
    ௨஌
    Metrics & Alert Hub

    View Slide

  45. mackerel custom plugins
    my-ec2-tag[go], jmx-get[go], diff-detector[go], delay-checker, interval-checker,
    periodic-checker, check-mysql-uptime[go], check-memcached-uptime[go],
    check-conntrack-free, check-crt-expiration, check-dns-rr, check-hydra-pos,
    check-inode, check-iptables, check-myip, check-solr-replication,
    check-solr-update, check-uptime, check-mysql-msr[go],
    mackerel-plugin-accelmail-counter, mackerel-plugin-gaurun-usage,
    mackerel-plugin-linux-lite, mackerel-plugin-msr[go]
    mackerel-plugin-ntpq, mackerel-plugin-search, mackerel-plugin-postfix
    mackerel-plugin-php-and-accesslog, mackerel-plugin-php-version
    Wrote 25+ custom plugins and utilitiy commands
    for monitoring Mercari infrustructure

    View Slide

  46. Kurado
    • github.com/kazeburo/Kurado
    • RRDTool based server metrics tool
    • Author: Me
    • ෳ਺୆αʔό΍2ͭͷ࣌ܥྻάϥϑΛͳΒ΂ͯද
    ࣔՄೳɻαʔόؒͷؔ܎΍௕ظؒͷτϨϯυΛ
    ௥͍΍͍͢

    View Slide

  47. Kurado
    Able to display 100+ graphs at once. Because it’s PNG.

    View Slide

  48. Monitoring
    $
    Log based Monitoring
    Agent based Monitoring
    αʔόʗαʔϏεΛ༷ʑͳ֯౓͔Β؂ࢹ
    ΠϯϑϥʗαʔϏεͷ৴པੑ޲্

    View Slide

  49. ·ͱΊ

    View Slide

  50. ·ͱΊ
    • ܧଓతͳΞϓϦέʔγϣϯͷվળΛࢧ͑Δٕज़
    • 10+ Deploys per Day
    • Safe Deploy with ngx_dynamic_upstream
    • Google Calendar + ChatOps
    • Monitoring
    • Log Monitoring
    • Agentd based Monitoring

    View Slide

  51. ·ͱΊ(࠶ܝ)
    SRE@Mercari
    γεςϜͷ໰୊Λൃݟɾղܾ͠ɺαʔϏεͷ৴པੑΛ޲্
    αʔϏεΛεέʔϧͤ͞Δϛυϧ΢ΣΞͷ։ൃɾӡ༻

    View Slide

  52. We’re Hiring!!
    ΑΖ͓͘͠ئ͍͠·͢ʂ

    View Slide