Slide 1

Slide 1 text

Site::Reliability::Engineering YAPC::Hokkaido 2016 SAPPORO Masahiro Nagano @kazeburo

Slide 2

Slide 2 text

Me • Masahiro Nagano • @kazeburo • Mercari, Inc
 Principal Engineer
 Site Reliability Engineering (SRE) Team • BASE, Inc Technical Advisor

Slide 3

Slide 3 text

CPAN/Perl • Gazelle • Cookie::Baker(::XS) • WWW::Form::UrlEncoded(::XS) • HTTP::Entity::Parser • Apache::LogFormat::Compiler • Plack::Middleware::ServerStatus::Lite • GrowthForecast

Slide 4

Slide 4 text

Agenda • Site Reliability Engineering (SRE) ͱ͸ • MercariͱMercariͷSRE • Mercari SREͷࣄྫ • Wrap-up &

Slide 5

Slide 5 text

SRE #ͱ͸

Slide 6

Slide 6 text

Site Reliability Engineering • Google ͕ఏএ • Google ͷ༷ʑͳϓϩμΫτɾαʔϏεΛԣஅ͠ ͯɺαΠτͷ৴པੑΛ޲্ͤ͞Δ Software Engineer/Team

Slide 7

Slide 7 text

Google SRE

Slide 8

Slide 8 text

Google SRE • શһ͕ Software Engineer • ։ൃνʔϜͱSREͱ࠾༻ͷ۠ผ͸ͳ͍ • 6ϲ݄ؒͷSREݚम΋ड͚Δ͜ͱ΋Ͱ͖ɺ͔ͦ͜ΒSREʹͳΔྫ΋ • SLA ͱ Error Budget • ։ൃνʔϜͱSREͷؒͰSLAɺError BudgetΛڞ༗͢Δ͜ͱͰɺ৽نػೳ ͷ௥ՃͱαʔϏεͷ҆ఆͷڝ߹Λղܾ͢Δ • SLA͸αʔϏε͝ͱʹܾఆ͞ΕΔ • Error Budget͕ෆ଍ͦ͠͏Ͱ͋Ε͹ɺ৴པੑΛ޲্ͤ͞Δ։ൃʹઐ೦͢ Δ͜ͱ͕ٻΊΒΕΔ

Slide 9

Slide 9 text

Google SRE • OnCall (౰൪) • SREνʔϜ͸ͩΕͰ΋ఆظతʹ୲౰͢Δ • ӡ༻ʹ͋ͨΔ࣌ؒΛ 50% ʹ੍ݶ • ࢒Γͷ࣌ؒ͸৴པੑ޲্ͷͨΊͷιϑτ΢ΣΞ։ൃ ʹ͋ͯΔ • ࣗಈԽ΍ෛՙ෼ࢄͷιϑτ΢ΣΞ͕ੜ·ΕΔ౔৕ʹ

Slide 10

Slide 10 text

Google SRE “what happens when a software engineer is tasked with what used to be called operations” https://cloudplatform.googleblog.com/2016/07/adventures-in-SRE-land-welcome-to-Google-Mission-Control.html Google Vice President of Engineering Ben Treynor Sloss, who coined the term SRE “our work is like being a part of the world’s most intense pit crew. We change the tires of a race car as it’s going 100mph” https://students.googleblog.com/2012/06/site-reliability-engineers-worlds-most.html

Slide 11

Slide 11 text

SRE ͷ࠾༻(US) Facebook, Twitter, Apple, eBay, Microsoft.. 3,753

Slide 12

Slide 12 text

SRE in JP • ʮΠϯϑϥΤϯδχΞ/DevOpsʯશ੝ • 2015/11 Mercari Tech Blog ʹͯ঺հͯ͠Ҏ߱஫໨͕ू·Δ • “ΠϯϑϥνʔϜվΊ Site Reliability Engineering (SRE) νʔϜʹ ͳΓ·ͨ͠”
 http://tech.mercari.com/entry/2015/11/18/153421 • 2016/06 SRE Tech Talk#1 • https://connpass.com/event/34825/

Slide 13

Slide 13 text

SRE in JP • αΠϘ΢ζ͞Μ “SRE νʔϜΛઃཱ͠·͢” • http://blog.cybozu.io/entry/2016/09/01/080000 • Cookpad͞Μ “࠷ۙɺElastic Beanstalk΍ECSͱઓ͍ͬͯΔSREνʔ ϜͷੁݪͰ͢” • http://techlife.cookpad.com/entry/2016/10/06/000000 • Retty͞Μ “ΠϯϑϥͰ͸ͳ͍ʂ৴པੑΛߴΊΔΤϯδχΞ ʮSREʯͱ͸ʁ” • https://www.wantedly.com/companies/retty/posts/17568

Slide 14

Slide 14 text

SREͷ࠾༻(JP) 17!

Slide 15

Slide 15 text

Mercari & Mercari SRE

Slide 16

Slide 16 text

Mercari Your Friendly Mobile MarketPlace JP US

Slide 17

Slide 17 text

Mercari • ϑϦϚΞϓϦͱͯ͠೔ຊ࠷େ • εϚϗͰ؆୯ʹग़඼ɻ͙ͦͯ͢͠ʹചΕΔ • ҆৺҆શͳܾࡁ • USͰ΋ల։த
 UK΋αʔϏε։࢝ʹΉ͚ͯ४උத

Slide 18

Slide 18 text

Mercari KPI DOWNLOAD GMV LISTED ITEMS 6000ສDL(JP+US) ݄ؒ100ԯԁҎ্ 1೔100ສ඼Ҏ্

Slide 19

Slide 19 text

Infrastructure UK ४උத JP US ͘͞ΒΠϯλʔωοτ
 ੴङDC AWS GCP

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

͘͞Βͷઐ༻αʔό • ίϩέʔγϣϯ/ϨϯλϧαʔόͰ͸ͳ͘ɺMaaS • ioMemory౥ࡌՄೳɻߴ͍ύϑΥʔϚϯε • ίϯτϩʔϧύωϧ͔Β࠶ىಈɺOSͷ࠶Πϯετʔϧɺ
 ίϯιʔϧ΁ͷΞΫηεͳͲΫϥ΢υͷΑ͏ʹར༻Մೳ • ϓϥΠϕʔτωοτϫʔΫͷߏங
 ͘͞ΒͷΫϥ΢υɺίϩέʔγϣϯαʔϏεͱͷ૬ޓ઀ଓ

Slide 22

Slide 22 text

Architecture nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached JP util util cloud cloud ઐ༻αʔόʹΑΔߏ੒͕ϕʔε

Slide 23

Slide 23 text

Architecture US nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia C d Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific DNS-RR App App App App App App MySQL MySQL memcached memcached EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 JP util util cloud cloud util util EC2 EC2

Slide 24

Slide 24 text

Architecture US nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia C d Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific DNS-RR App App App App App App MySQL MySQL memcached memcached EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 JP Ϋϥ΢υͰ΋ ಉ͡ߏ੒ util util cloud cloud util util EC2 EC2

Slide 25

Slide 25 text

Architecture • ϝϯςϯεϏϦςΟɾεέʔϥϏϦςΟઓུͷڞ௨Խ • > গਓ਺Ͱͷ Operation • > US Ͱͷ App Store ϥϯΩϯά3Ґʹ଱͑Δ • αʔϏεͷల։࣌ʹ࠷΋Bestͳ Infrastructure Λબ୒ • > UK͸GCP

Slide 26

Slide 26 text

Mercari SRE Team

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

Mercari SRE • 2015/11 ͔Β SRE ʹมߋ
 ͦΕ·Ͱ͸ΠϯϑϥνʔϜ • ϝϯόʔ͸6ਓɻͨͩ͠2໊͸ผνʔϜʹग़޲த • શһ͕౦ژΦϑΟεۈ຿

Slide 29

Slide 29 text

SREʹมߋͨ͠ཧ༝ • ʮΠϯϑϥʯ͔ΒΠϝʔδ͢Δۀ຿ͷ࿮Λ௒͑Δ • ϝϧΧϦΛ5೥ɾ10೥ͱ௕͘࢖͍ͬͯͨͩͨ͘Ίͷʮ৴པੑʯ • εϚϗͰ͸24࣌ؒ޷͖ͳ࣌ʹɺ޷͖ͳ৔ॴ͔ΒΞΫηε͕དྷ Δɻ͍ͭͰ΋շదʹΞΫηε͕Ͱ͖ΔʮՄ༻ੑʯͱ
 ʮύϑΥʔϚϯεʯ • ιϑτ΢ΣΞΤϯδχΞϦϯάͰ͜ΕΒΛղܾ͍ͯ͘͠νʔϜ • (ւ֎/USͰ΋௨͡Δ໊শ)

Slide 30

Slide 30 text

OSS by SRE • Gaurun - General push notification server in Go • https://github.com/mercari/gaurun • WideBullet - an API gateway with JSON-RPC • https://github.com/mercari/widebullet • Mackerel Plugins • https://github.com/kazeburo/custom-mackerel-plugins

Slide 31

Slide 31 text

Mercari SREͷۀ຿ • APIαʔόɺϛυϧ΢ΣΞͷՄ༻ੑɺύϑΥʔϚϯεͷ
 ҡ࣋ɾ޲্ • OnCall (ো֐ରԠͷ౰൪) • ϩάऩूɾ෼ੳج൫ͷߏஙɺӡ༻ • αʔόϓϩϏδϣχϯάɾσϓϩΠͷ੔උ • ηΩϡϦςΟͷ୲อ • ։ൃ؀ڥͳͲͷ੔උ

Slide 32

Slide 32 text

OnCall • ೔༵೔0:00͔Β౔༵೔23:59·Ͱͷ1िؒ͝ͱަ୅ • Ξϥʔτͷड͚औΓͱۓٸରԠ • SREશһ͕௨ۈిं಺ʹ͍Δ͜ͱΛආ͚ΔͨΊɺฏ೔͸ϝ ϯόʔ͕ग़ࣾ͢Δ·Ͱࣗ୐଴ػ • ٳ೔΋15-20෼Ҏ಺ʹରԠ։࢝Ͱ͖Δ͜ͱ͕๬·͘͠ɺߦ ಈʹ੍ݶ͋Δ ਂ໷ͷΞϥʔτɺٳ೔ͷରԠͳͲɺՈ଒΁ͷෛ୲΋͋Γ·͢ɻ
 Ԟ͞Μଉࢠ່ͷڠྗʹ͸͍ͭ΋ײँ

Slide 33

Slide 33 text

Emergency call by Bot ☎

Slide 34

Slide 34 text

Mercari SREࣄྫ

Slide 35

Slide 35 text

(1/3) search pre-cacher

Slide 36

Slide 36 text

Nginx as a Internal LB App nginx nginx Solr Solr Solr Solr App App clustering with consul nginx nginx DNS-RR Services LLC or its affiliates. All rights reserved. Multimedia Corporate data center Traditional server Mobile Client Example: IAM Add-on ment/ k Requester Workers private network

Slide 37

Slide 37 text

Overload of Solr • Homeλϒͷ1ͭʹϢʔβͷଐੑʹج͍ͮͨΦε εϝͷ঎඼Λग़͢Ҋ݅ • ঎඼ϦετͷऔಘʹSolrΛ࢖͏ • ABςετͰ௿ׂ͍߹ͰϦϦʔεͨ࣌͠఺ͰSolrͷ ෛՙ͕૝ఆΑΓ͔ͳΓߴ͍͜ͱ͕൑໌

Slide 38

Slide 38 text

Nginx as a Useful L7-LB App nginx nginx Solr Solr Solr Solr App App clustering with consul nginx nginx DNS-RR Services LLC or its affiliates. All rights reserved. Multimedia Corporate data center Traditional server Mobile Client Example: IAM Add-on ment/ k Requester Workers private network Cache!! Here

Slide 39

Slide 39 text

Cache is NOT silver bullet • Cache͕ଘࡏ͠ͳ͍৔߹͸஗͍ɻϢʔβମݧͷѱԽ • Cache Thundering Problem • eg) proxy_cache_lock • => Cache ΛઌಡΈ͢Δ͜ͱ͸Ͱ͖ͳ͍͔ • ύλʔϯ͕ଟ͘batchͰcacheΛ࡞੒͢Δͷ͸೉͍͠ • සൟʹߋ৽͍ͨ͠

Slide 40

Slide 40 text

make Search pre-cacher App nginx nginx Solr Solr Solr Solr App App log daemon tail access.log http req

Slide 41

Slide 41 text

#!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; while(1){ open(my $fh, "-|", "tail","-5000","/var/log/nginx/access.log") or die $!; my %path_query; while(<$fh>){ if ( m!pre-cacher! ) { next } if ( $_ !~ m!(cachemode=readahead|group.limit)! ) { next } if ( m!\suri:(.+?)\s! ){ $path_query{"$1"} = 1; } } my $ua = LWP::UserAgent->new(agent=>”pre-cacher"); $ua->timeout(20); for my $path_query ( keys %path_query ) { my $req = HTTP::Request->new( GET => ‘http://localhost' . $path_query ); $req->header('Host'=>'lb-search'); $ua->request($req); } sleep 5; }

Slide 42

Slide 42 text

Search pre-cacher • ͱ͋Δ Perl Monger (Me) ͷSRE͕3෼Ͱॻ͍ͨ • ॏ͍Solr queryΛ౤͛Δ࣌͸query_stringʹ readadheadͱ։ൃऀʹ͚ͭͯ໯͑͹ࣗಈͰઌಡ ΈରԠՄೳ

Slide 43

Slide 43 text

(2/3) CoW in PHP

Slide 44

Slide 44 text

JobWorker in Mercari App Q4M Q4M enqueue child child child child Parent fork(2) php-parallel-prefork dequeue clustering with consul

Slide 45

Slide 45 text

JobWorker in Mercari • Q4M + Parallel::Prefork • େਓͷࣄ৘Ͱ max-request-per-child ͸ʮ1ʯ • workerͷॲཧ଎౓ͷߴ଎Խ͕՝୊

Slide 46

Slide 46 text

Copy on Write (CoW) • fork(2) Λߦͬͨࡍʹɺ਌ϓϩηε͔Βࢠϓϩη ε΁ͷϝϞϦʔίϐʔ͸ߦΘͣɺมߋͯ͠ॳΊͯ ίϐʔ͞ΕΔɻ • ϝϞϦʔΛ਌ϓϩηεͱڞ༗͢Δ͜ͱͰઅ໿ • ܭࢉ݁ՌΛڞ༗͢Δ͜ͱͰෛՙΛԼ͛Δ • mod_perl mongerͳΒྦΛྲྀͯ͠ޠΔ

Slide 47

Slide 47 text

CoW in PHP • mod_php Ͱ͸ޮՌͳ͠
 ਌ϓϩηεىಈ࣌ʹԿ͔΍ΔΑ͏ͳΦϓγϣϯ͕ଘࡏ͠ ͳ͍ • CLI Ͱ͸࢖͑Δ • PHPͰ͸ClassΛར༻ͨ࣌͠ʹautoload͢ΔจԽ • > ࣗવͱCoWʹͳΓʹ͍͘ • ϕϯνϚʔΫͰূ໌ͯ͠औΓࠐΜͰ΋Β͏

Slide 48

Slide 48 text

function t_sendmail() { $smtp = new SimpleMailWithSwift(); $smtp->send($params); } // t_sendmail(); // ਌ϓϩηεͰϝʔϧΛૹΔ͜ͱͰclassͷpreloadΛߦ͏ for ( $i=0; $i < 300; $i++ ) { $pid = pcntl_fork(); if ($pid == -1) { die('fork Ͱ͖·ͤΜ'); } else if ($pid) { pcntl_wait($status); } else { // ࢠϓϩηεͷ৔߹ t_sendmail(); exit(0); } } 28.639s => 22.048s

Slide 49

Slide 49 text

CoW in PHP JobWorker • ϝʔϧૹ৴workerͷ਌ϓϩηεͰ1௨dummyͷ ϝʔϧΛૹΔ͜ͱͰSwiftؔ࿈classͷಡΈࠐΈ • CoWʹΑΓࢠϓϩηεͷෛՙ࡟ݮͱߴ଎Խ

Slide 50

Slide 50 text

CoW in PHP JobWorker

Slide 51

Slide 51 text

(3/3) URI Shorten Service

Slide 52

Slide 52 text

In-house URL shortener • େن໛ϝʔϧ഑৴࣌ʹར༻͍ͨ͠ • > ౎౓URLੜ੒Ͱ͖Δmsec୯ҐͷύϑΥʔϚϯε • JP/US/UK Ͳ͔͜ΒͰ΋௿஗ԆͰ࢖͑Δ • URIͷhost෦͸શRegionͰڞ௨ • https://example.ly/abcd1234

Slide 53

Slide 53 text

Where should we place the infrastructure?

Slide 54

Slide 54 text

RTT between regions ੴ AWS ౦ GCP 18ms 110ms 140ms GCP 6ms ͍͍ͩͨͷ஋

Slide 55

Slide 55 text

RTT between regions ੴ AWS ౦ GCP 18ms 110ms 140ms GCP 6ms ͍͍ͩͨͷ஋ γεςϜΛ෼ࢄͯ͠഑ஔ͢Δඞཁ͕͋Δ ࢀর͸Օॴʹ·ͱΊΔ

Slide 56

Slide 56 text

Distributed architecture ੴ AWS ౦ GCP GCP

Slide 57

Slide 57 text

Distributed architecture ੴ AWS ౦ GCP GCP

Slide 58

Slide 58 text

Distributed architecture ੴ AWS ౦ GCP j.example.ly u.example.ly g.example.ly GCP

Slide 59

Slide 59 text

Distributed architecture ੴ AWS ౦ GCP j.example.ly u.example.ly g.example.ly GCP

Slide 60

Slide 60 text

Distributed architecture ੴ AWS ౦ GCP j.example.ly u.example.ly g.example.ly example.ly/.+([a-z]) GCP

Slide 61

Slide 61 text

Distributed architecture ੴ AWS ౦ GCP j.example.ly u.example.ly g.example.ly example.ly/.+([a-z]) GCP

Slide 62

Slide 62 text

Distributed architecture ੴ AWS ౦ GCP j.example.ly u.example.ly g.example.ly example.ly/.+([a-z]) GCP

Slide 63

Slide 63 text

In-house URL shortener • Regional API • Go + MySQL • Private Network಺ʹઃஔ • Global API • GAE/Go / OperationͷলྗԽ • US region ʹઃஔ

Slide 64

Slide 64 text

Wrap-up & Site::Reliability::Engineer(ing)? & Perl (monger)?

Slide 65

Slide 65 text

SRE ʹ๬·͍͠ೳྗ

Slide 66

Slide 66 text

ίϯϐϡʔλαΠΤϯε ΋ͪΖΜɺYES ͨͩ͠ɺֶΜͰ͍͘͜ͱ΋Ͱ͖Δ

Slide 67

Slide 67 text

ଵଦ ୹ؾ ၗຫ Perl Mongerͷҙࣝͷதʹ͸͖ͬͱ͋Δࢤ޲

Slide 68

Slide 68 text

• ଵଦ • > Infrastructure, Operation ͷࣗಈԽ • ୹ؾ • > ߴ͍ύϑΥʔϚϯεΛ௥͍ٻΊΔ • ၗຫ • > ଟ͘ͷϢʔβɺେن໛ͳΞΫηεɺ༷ʑͳσʔλΛѻ͏ • > ۀ຿΁ͷϓϥΠυ

Slide 69

Slide 69 text

SREʹͱͬͯPerl͸Α͍ಓ۩͔ Perl͸ੈքதͰݶΓͳ͘ଟ͘ͷγεςϜͷதͰࠓ΋࢖ΘΕ͍ͯΔ͕ ࠷৽ͷ Cool ͳϓϩάϥϜݴޠͰ͸ͳ͍

Slide 70

Slide 70 text

Perl ͷಛ௃

Slide 71

Slide 71 text

• TMTOWTDI • ڧྗͳจࣈྻॲཧ • One liner • ༷ʑͳ؀ڥͰ࢖͑Δ • CPAN • ਓͷॻ͍ͨίʔυ͕ಡΊͳ͍

Slide 72

Slide 72 text

DSL for System Call • CݴޠϑϨϯυϦʔ • System callͱ1:1ͰରԠ • > fork, waitpid, syswrite, sysreadͳͲ • Socket·ΘΓ΋ૉ௚ • Ruby, PHPͱൺ΂ͯΫη͕ͳ͍ • OS΍NetworkΛֶͿͷʹ͸͍͍ݴޠ(ͩͱࢥ͏)

Slide 73

Slide 73 text

࠷ߴͷڭՊॻ @kazuho Starlet, Server-Starter, Parallel::Prefork HTTP::Parser::XS, Furl, kaztools and many!! Perl Monger͸ৗʹ໰୊ͷղܾͷߴ଎ಓ࿏ͷ্ʹ͍Δ

Slide 74

Slide 74 text

Perl Monger ͷ 
 Next career ͱͯ͠ͷ SRE ໨ࢦͯ͠ΈΔͷ͸͍͔͕Ͱ͠ΐ͏͔ʁ

Slide 75

Slide 75 text

“ྑ͍໰୊”ͱ͓଴͍ͪͯ͠·͢

Slide 76

Slide 76 text

Ҏ্ɻ