Site::Reliability::Engineering - YAPC::Hakkaido 2016 Sapporo

700669515ee872152d8b9403c2a0cf8c?s=47 kazeburo
December 12, 2016

Site::Reliability::Engineering - YAPC::Hakkaido 2016 Sapporo

Site::Reliability::Engineering - YAPC::Hakkaido 2016 Sapporo

700669515ee872152d8b9403c2a0cf8c?s=128

kazeburo

December 12, 2016
Tweet

Transcript

  1. Site::Reliability::Engineering YAPC::Hokkaido 2016 SAPPORO Masahiro Nagano @kazeburo

  2. Me • Masahiro Nagano • @kazeburo • Mercari, Inc
 Principal

    Engineer
 Site Reliability Engineering (SRE) Team • BASE, Inc Technical Advisor
  3. CPAN/Perl • Gazelle • Cookie::Baker(::XS) • WWW::Form::UrlEncoded(::XS) • HTTP::Entity::Parser •

    Apache::LogFormat::Compiler • Plack::Middleware::ServerStatus::Lite • GrowthForecast
  4. Agenda • Site Reliability Engineering (SRE) ͱ͸ • MercariͱMercariͷSRE •

    Mercari SREͷࣄྫ • Wrap-up &
  5. SRE #ͱ͸

  6. Site Reliability Engineering • Google ͕ఏএ • Google ͷ༷ʑͳϓϩμΫτɾαʔϏεΛԣஅ͠ ͯɺαΠτͷ৴པੑΛ޲্ͤ͞Δ

    Software Engineer/Team
  7. Google SRE

  8. Google SRE • શһ͕ Software Engineer • ։ൃνʔϜͱSREͱ࠾༻ͷ۠ผ͸ͳ͍ • 6ϲ݄ؒͷSREݚम΋ड͚Δ͜ͱ΋Ͱ͖ɺ͔ͦ͜ΒSREʹͳΔྫ΋

    • SLA ͱ Error Budget • ։ൃνʔϜͱSREͷؒͰSLAɺError BudgetΛڞ༗͢Δ͜ͱͰɺ৽نػೳ ͷ௥ՃͱαʔϏεͷ҆ఆͷڝ߹Λղܾ͢Δ • SLA͸αʔϏε͝ͱʹܾఆ͞ΕΔ • Error Budget͕ෆ଍ͦ͠͏Ͱ͋Ε͹ɺ৴པੑΛ޲্ͤ͞Δ։ൃʹઐ೦͢ Δ͜ͱ͕ٻΊΒΕΔ
  9. Google SRE • OnCall (౰൪) • SREνʔϜ͸ͩΕͰ΋ఆظతʹ୲౰͢Δ • ӡ༻ʹ͋ͨΔ࣌ؒΛ 50%

    ʹ੍ݶ • ࢒Γͷ࣌ؒ͸৴པੑ޲্ͷͨΊͷιϑτ΢ΣΞ։ൃ ʹ͋ͯΔ • ࣗಈԽ΍ෛՙ෼ࢄͷιϑτ΢ΣΞ͕ੜ·ΕΔ౔৕ʹ
  10. Google SRE “what happens when a software engineer is tasked

    with what used to be called operations” https://cloudplatform.googleblog.com/2016/07/adventures-in-SRE-land-welcome-to-Google-Mission-Control.html Google Vice President of Engineering Ben Treynor Sloss, who coined the term SRE “our work is like being a part of the world’s most intense pit crew. We change the tires of a race car as it’s going 100mph” https://students.googleblog.com/2012/06/site-reliability-engineers-worlds-most.html
  11. SRE ͷ࠾༻(US) Facebook, Twitter, Apple, eBay, Microsoft.. 3,753

  12. SRE in JP • ʮΠϯϑϥΤϯδχΞ/DevOpsʯશ੝ • 2015/11 Mercari Tech Blog

    ʹͯ঺հͯ͠Ҏ߱஫໨͕ू·Δ • “ΠϯϑϥνʔϜվΊ Site Reliability Engineering (SRE) νʔϜʹ ͳΓ·ͨ͠”
 http://tech.mercari.com/entry/2015/11/18/153421 • 2016/06 SRE Tech Talk#1 • https://connpass.com/event/34825/
  13. SRE in JP • αΠϘ΢ζ͞Μ “SRE νʔϜΛઃཱ͠·͢” • http://blog.cybozu.io/entry/2016/09/01/080000 •

    Cookpad͞Μ “࠷ۙɺElastic Beanstalk΍ECSͱઓ͍ͬͯΔSREνʔ ϜͷੁݪͰ͢” • http://techlife.cookpad.com/entry/2016/10/06/000000 • Retty͞Μ “ΠϯϑϥͰ͸ͳ͍ʂ৴པੑΛߴΊΔΤϯδχΞ ʮSREʯͱ͸ʁ” • https://www.wantedly.com/companies/retty/posts/17568
  14. SREͷ࠾༻(JP) 17!

  15. Mercari & Mercari SRE

  16. Mercari Your Friendly Mobile MarketPlace JP US

  17. Mercari • ϑϦϚΞϓϦͱͯ͠೔ຊ࠷େ • εϚϗͰ؆୯ʹग़඼ɻ͙ͦͯ͢͠ʹചΕΔ • ҆৺҆શͳܾࡁ • USͰ΋ల։த
 UK΋αʔϏε։࢝ʹΉ͚ͯ४උத

  18. Mercari KPI DOWNLOAD GMV LISTED ITEMS 6000ສDL(JP+US) ݄ؒ100ԯԁҎ্ 1೔100ສ඼Ҏ্

  19. Infrastructure UK ४උத JP US ͘͞ΒΠϯλʔωοτ
 ੴङDC AWS GCP

  20. None
  21. ͘͞Βͷઐ༻αʔό • ίϩέʔγϣϯ/ϨϯλϧαʔόͰ͸ͳ͘ɺMaaS • ioMemory౥ࡌՄೳɻߴ͍ύϑΥʔϚϯε • ίϯτϩʔϧύωϧ͔Β࠶ىಈɺOSͷ࠶Πϯετʔϧɺ
 ίϯιʔϧ΁ͷΞΫηεͳͲΫϥ΢υͷΑ͏ʹར༻Մೳ • ϓϥΠϕʔτωοτϫʔΫͷߏங


    ͘͞ΒͷΫϥ΢υɺίϩέʔγϣϯαʔϏεͱͷ૬ޓ઀ଓ
  22. Architecture nginx nginx nginx ©2011 Amazon Web Services LLC or

    its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached JP util util cloud cloud ઐ༻αʔόʹΑΔߏ੒͕ϕʔε
  23. Architecture US nginx nginx nginx ©2011 Amazon Web Services LLC

    or its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia C d Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific DNS-RR App App App App App App MySQL MySQL memcached memcached EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 JP util util cloud cloud util util EC2 EC2
  24. Architecture US nginx nginx nginx ©2011 Amazon Web Services LLC

    or its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia C d Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific DNS-RR App App App App App App MySQL MySQL memcached memcached EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 JP Ϋϥ΢υͰ΋ ಉ͡ߏ੒ util util cloud cloud util util EC2 EC2
  25. Architecture • ϝϯςϯεϏϦςΟɾεέʔϥϏϦςΟઓུͷڞ௨Խ • > গਓ਺Ͱͷ Operation • > US

    Ͱͷ App Store ϥϯΩϯά3Ґʹ଱͑Δ • αʔϏεͷల։࣌ʹ࠷΋Bestͳ Infrastructure Λબ୒ • > UK͸GCP
  26. Mercari SRE Team

  27. None
  28. Mercari SRE • 2015/11 ͔Β SRE ʹมߋ
 ͦΕ·Ͱ͸ΠϯϑϥνʔϜ • ϝϯόʔ͸6ਓɻͨͩ͠2໊͸ผνʔϜʹग़޲த

    • શһ͕౦ژΦϑΟεۈ຿
  29. SREʹมߋͨ͠ཧ༝ • ʮΠϯϑϥʯ͔ΒΠϝʔδ͢Δۀ຿ͷ࿮Λ௒͑Δ • ϝϧΧϦΛ5೥ɾ10೥ͱ௕͘࢖͍ͬͯͨͩͨ͘Ίͷʮ৴པੑʯ • εϚϗͰ͸24࣌ؒ޷͖ͳ࣌ʹɺ޷͖ͳ৔ॴ͔ΒΞΫηε͕དྷ Δɻ͍ͭͰ΋շదʹΞΫηε͕Ͱ͖ΔʮՄ༻ੑʯͱ
 ʮύϑΥʔϚϯεʯ •

    ιϑτ΢ΣΞΤϯδχΞϦϯάͰ͜ΕΒΛղܾ͍ͯ͘͠νʔϜ • (ւ֎/USͰ΋௨͡Δ໊শ)
  30. OSS by SRE • Gaurun - General push notification server

    in Go • https://github.com/mercari/gaurun • WideBullet - an API gateway with JSON-RPC • https://github.com/mercari/widebullet • Mackerel Plugins • https://github.com/kazeburo/custom-mackerel-plugins
  31. Mercari SREͷۀ຿ • APIαʔόɺϛυϧ΢ΣΞͷՄ༻ੑɺύϑΥʔϚϯεͷ
 ҡ࣋ɾ޲্ • OnCall (ো֐ରԠͷ౰൪) • ϩάऩूɾ෼ੳج൫ͷߏஙɺӡ༻

    • αʔόϓϩϏδϣχϯάɾσϓϩΠͷ੔උ • ηΩϡϦςΟͷ୲อ • ։ൃ؀ڥͳͲͷ੔උ
  32. OnCall • ೔༵೔0:00͔Β౔༵೔23:59·Ͱͷ1िؒ͝ͱަ୅ • Ξϥʔτͷड͚औΓͱۓٸରԠ • SREશһ͕௨ۈిं಺ʹ͍Δ͜ͱΛආ͚ΔͨΊɺฏ೔͸ϝ ϯόʔ͕ग़ࣾ͢Δ·Ͱࣗ୐଴ػ • ٳ೔΋15-20෼Ҏ಺ʹରԠ։࢝Ͱ͖Δ͜ͱ͕๬·͘͠ɺߦ

    ಈʹ੍ݶ͋Δ ਂ໷ͷΞϥʔτɺٳ೔ͷରԠͳͲɺՈ଒΁ͷෛ୲΋͋Γ·͢ɻ
 Ԟ͞Μଉࢠ່ͷڠྗʹ͸͍ͭ΋ײँ
  33. Emergency call by Bot ☎

  34. Mercari SREࣄྫ

  35. (1/3) search pre-cacher

  36. Nginx as a Internal LB App nginx nginx Solr Solr

    Solr Solr App App clustering with consul nginx nginx DNS-RR Services LLC or its affiliates. All rights reserved. Multimedia Corporate data center Traditional server Mobile Client Example: IAM Add-on ment/ k Requester Workers private network
  37. Overload of Solr • Homeλϒͷ1ͭʹϢʔβͷଐੑʹج͍ͮͨΦε εϝͷ঎඼Λग़͢Ҋ݅ • ঎඼ϦετͷऔಘʹSolrΛ࢖͏ • ABςετͰ௿ׂ͍߹ͰϦϦʔεͨ࣌͠఺ͰSolrͷ

    ෛՙ͕૝ఆΑΓ͔ͳΓߴ͍͜ͱ͕൑໌
  38. Nginx as a Useful L7-LB App nginx nginx Solr Solr

    Solr Solr App App clustering with consul nginx nginx DNS-RR Services LLC or its affiliates. All rights reserved. Multimedia Corporate data center Traditional server Mobile Client Example: IAM Add-on ment/ k Requester Workers private network Cache!! Here
  39. Cache is NOT silver bullet • Cache͕ଘࡏ͠ͳ͍৔߹͸஗͍ɻϢʔβମݧͷѱԽ • Cache Thundering

    Problem • eg) proxy_cache_lock • => Cache ΛઌಡΈ͢Δ͜ͱ͸Ͱ͖ͳ͍͔ • ύλʔϯ͕ଟ͘batchͰcacheΛ࡞੒͢Δͷ͸೉͍͠ • සൟʹߋ৽͍ͨ͠
  40. make Search pre-cacher App nginx nginx Solr Solr Solr Solr

    App App log daemon tail access.log http req
  41. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; while(1){ open(my $fh,

    "-|", "tail","-5000","/var/log/nginx/access.log") or die $!; my %path_query; while(<$fh>){ if ( m!pre-cacher! ) { next } if ( $_ !~ m!(cachemode=readahead|group.limit)! ) { next } if ( m!\suri:(.+?)\s! ){ $path_query{"$1"} = 1; } } my $ua = LWP::UserAgent->new(agent=>”pre-cacher"); $ua->timeout(20); for my $path_query ( keys %path_query ) { my $req = HTTP::Request->new( GET => ‘http://localhost' . $path_query ); $req->header('Host'=>'lb-search'); $ua->request($req); } sleep 5; }
  42. Search pre-cacher • ͱ͋Δ Perl Monger (Me) ͷSRE͕3෼Ͱॻ͍ͨ • ॏ͍Solr

    queryΛ౤͛Δ࣌͸query_stringʹ readadheadͱ։ൃऀʹ͚ͭͯ໯͑͹ࣗಈͰઌಡ ΈରԠՄೳ
  43. (2/3) CoW in PHP

  44. JobWorker in Mercari App Q4M Q4M enqueue child child child

    child Parent fork(2) php-parallel-prefork dequeue clustering with consul
  45. JobWorker in Mercari • Q4M + Parallel::Prefork • େਓͷࣄ৘Ͱ max-request-per-child

    ͸ʮ1ʯ • workerͷॲཧ଎౓ͷߴ଎Խ͕՝୊
  46. Copy on Write (CoW) • fork(2) Λߦͬͨࡍʹɺ਌ϓϩηε͔Βࢠϓϩη ε΁ͷϝϞϦʔίϐʔ͸ߦΘͣɺมߋͯ͠ॳΊͯ ίϐʔ͞ΕΔɻ •

    ϝϞϦʔΛ਌ϓϩηεͱڞ༗͢Δ͜ͱͰઅ໿ • ܭࢉ݁ՌΛڞ༗͢Δ͜ͱͰෛՙΛԼ͛Δ • mod_perl mongerͳΒྦΛྲྀͯ͠ޠΔ
  47. CoW in PHP • mod_php Ͱ͸ޮՌͳ͠
 ਌ϓϩηεىಈ࣌ʹԿ͔΍ΔΑ͏ͳΦϓγϣϯ͕ଘࡏ͠ ͳ͍ • CLI

    Ͱ͸࢖͑Δ • PHPͰ͸ClassΛར༻ͨ࣌͠ʹautoload͢ΔจԽ • > ࣗવͱCoWʹͳΓʹ͍͘ • ϕϯνϚʔΫͰূ໌ͯ͠औΓࠐΜͰ΋Β͏
  48. function t_sendmail() { $smtp = new SimpleMailWithSwift(); $smtp->send($params); } //

    t_sendmail(); // ਌ϓϩηεͰϝʔϧΛૹΔ͜ͱͰclassͷpreloadΛߦ͏ for ( $i=0; $i < 300; $i++ ) { $pid = pcntl_fork(); if ($pid == -1) { die('fork Ͱ͖·ͤΜ'); } else if ($pid) { pcntl_wait($status); } else { // ࢠϓϩηεͷ৔߹ t_sendmail(); exit(0); } } 28.639s => 22.048s
  49. CoW in PHP JobWorker • ϝʔϧૹ৴workerͷ਌ϓϩηεͰ1௨dummyͷ ϝʔϧΛૹΔ͜ͱͰSwiftؔ࿈classͷಡΈࠐΈ • CoWʹΑΓࢠϓϩηεͷෛՙ࡟ݮͱߴ଎Խ

  50. CoW in PHP JobWorker

  51. (3/3) URI Shorten Service

  52. In-house URL shortener • େن໛ϝʔϧ഑৴࣌ʹར༻͍ͨ͠ • > ౎౓URLੜ੒Ͱ͖Δmsec୯ҐͷύϑΥʔϚϯε • JP/US/UK

    Ͳ͔͜ΒͰ΋௿஗ԆͰ࢖͑Δ • URIͷhost෦͸શRegionͰڞ௨ • https://example.ly/abcd1234
  53. Where should we place the infrastructure?

  54. RTT between regions ੴ AWS ౦ GCP 18ms 110ms 140ms

    GCP 6ms ͍͍ͩͨͷ஋
  55. RTT between regions ੴ AWS ౦ GCP 18ms 110ms 140ms

    GCP 6ms ͍͍ͩͨͷ஋ γεςϜΛ෼ࢄͯ͠഑ஔ͢Δඞཁ͕͋Δ ࢀর͸Օॴʹ·ͱΊΔ
  56. Distributed architecture ੴ AWS ౦ GCP GCP

  57. Distributed architecture ੴ AWS ౦ GCP GCP

  58. Distributed architecture ੴ AWS ౦ GCP j.example.ly u.example.ly g.example.ly GCP

  59. Distributed architecture ੴ AWS ౦ GCP j.example.ly u.example.ly g.example.ly GCP

  60. Distributed architecture ੴ AWS ౦ GCP j.example.ly u.example.ly g.example.ly example.ly/.+([a-z])

    GCP
  61. Distributed architecture ੴ AWS ౦ GCP j.example.ly u.example.ly g.example.ly example.ly/.+([a-z])

    GCP
  62. Distributed architecture ੴ AWS ౦ GCP j.example.ly u.example.ly g.example.ly example.ly/.+([a-z])

    GCP
  63. In-house URL shortener • Regional API • Go + MySQL

    • Private Network಺ʹઃஔ • Global API • GAE/Go / OperationͷলྗԽ • US region ʹઃஔ
  64. Wrap-up & Site::Reliability::Engineer(ing)? & Perl (monger)?

  65. SRE ʹ๬·͍͠ೳྗ

  66. ίϯϐϡʔλαΠΤϯε ΋ͪΖΜɺYES ͨͩ͠ɺֶΜͰ͍͘͜ͱ΋Ͱ͖Δ

  67. ଵଦ ୹ؾ ၗຫ Perl Mongerͷҙࣝͷதʹ͸͖ͬͱ͋Δࢤ޲

  68. • ଵଦ • > Infrastructure, Operation ͷࣗಈԽ • ୹ؾ •

    > ߴ͍ύϑΥʔϚϯεΛ௥͍ٻΊΔ • ၗຫ • > ଟ͘ͷϢʔβɺେن໛ͳΞΫηεɺ༷ʑͳσʔλΛѻ͏ • > ۀ຿΁ͷϓϥΠυ
  69. SREʹͱͬͯPerl͸Α͍ಓ۩͔ Perl͸ੈքதͰݶΓͳ͘ଟ͘ͷγεςϜͷதͰࠓ΋࢖ΘΕ͍ͯΔ͕ ࠷৽ͷ Cool ͳϓϩάϥϜݴޠͰ͸ͳ͍

  70. Perl ͷಛ௃

  71. • TMTOWTDI • ڧྗͳจࣈྻॲཧ • One liner • ༷ʑͳ؀ڥͰ࢖͑Δ •

    CPAN • ਓͷॻ͍ͨίʔυ͕ಡΊͳ͍
  72. DSL for System Call • CݴޠϑϨϯυϦʔ • System callͱ1:1ͰରԠ •

    > fork, waitpid, syswrite, sysreadͳͲ • Socket·ΘΓ΋ૉ௚ • Ruby, PHPͱൺ΂ͯΫη͕ͳ͍ • OS΍NetworkΛֶͿͷʹ͸͍͍ݴޠ(ͩͱࢥ͏)
  73. ࠷ߴͷڭՊॻ @kazuho Starlet, Server-Starter, Parallel::Prefork HTTP::Parser::XS, Furl, kaztools and many!!

    Perl Monger͸ৗʹ໰୊ͷղܾͷߴ଎ಓ࿏ͷ্ʹ͍Δ
  74. Perl Monger ͷ 
 Next career ͱͯ͠ͷ SRE ໨ࢦͯ͠ΈΔͷ͸͍͔͕Ͱ͠ΐ͏͔ʁ

  75. “ྑ͍໰୊”ͱ͓଴͍ͪͯ͠·͢

  76. Ҏ্ɻ