インフラチームからSREへ / SRE in Mercari Developers Summit 2018
by
kazeburo
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
ΠϯϑϥνʔϜ͔ΒSRE ϝϧΧϦΛࢧ͑Δ৽͍͠Πϯϑϥͷ͋Γํ Masahiro Nagano @kazeburo Developers Summit 2018/2/16
Slide 2
Slide 2 text
Me • Masahiro Nagano / խ • @kazeburo (twitter/github) • גࣜձࣾϝϧΧϦ ϓϦϯγύϧΤϯδχΞ Site Reliability Engineering (SRE) νʔϜ • BASE, Inc ٕज़ΞυόΠβʔ
Slide 3
Slide 3 text
Me • ~ 2006: ژͰελʔτΞοϓࢀՃ • ΤϯδχΞ໊ • ։ൃΛ͠ͳ͕ΒΠϯϑϥͷ໘ΛݟΔɻDC࡞ۀͬͨ • ΞϓϦέʔγϣϯͷνϡʔχϯάɺۭ͍ͨϦιʔεͰ৽ػೳͷՃͱ͍͏αΠΫϧ • 2006 ~: mixi • ʮΞϓϦέʔγϣϯӡ༻νʔϜʯDCʹߦ͔ͳ͍ΦϖϨʔγϣϯ • େنը૾৴ΞϓϦέʔγϣϯͷνϡʔχϯά
Slide 4
Slide 4 text
Me • 2010 ~: livedoor (NHN Japan => LINE) • livedoorLINEϑΝϛϦʔͷαʔϏεΛԣஅͯ͠ΠϯϑϥετϥΫνϟ ύϑΥʔϚϯεͷվળ • livedoor Blog ͷMySQLνϡʔχϯά / Plack࠷దԽ • 2015/02 ~ : mercari
Slide 5
Slide 5 text
࠷ۙͷ׆ಈ • ొஃ • AWS Dev Day Tokyo 2017 • YAPC::Fukuoka 2017, YAPC::Hokkaido 2016 • YAPC::Okinawa 2018, Manabiya Teratail Developer Days ొஃ༧ఆ • هࣄ • WEB+DB PRESS Vol.88, Vol.92-97 ࿈ࡌ, Vol.100 • ܦSYSTEMS 2017 7݄߸, ITPro
Slide 6
Slide 6 text
AGENDA • ࣗݾհ • SREͱͷग़ձ͍ • ϝϧΧϦʹ͍ͭͯ • SREͱ • ϝϧΧϦSREͷࣄྫͱ͜Ε͔Β
Slide 7
Slide 7 text
SRE ͱͷग़ձ͍ ͳͥSREͳͷ͔
Slide 8
Slide 8 text
ΠϯϑϥΤϯδχΞʁ • mixi࣌ʮΞϓϦӡ༻νʔϜʯ • Πϯϑϥ(σʔληϯλʔ)νʔϜଞʹ͍Δ • σʔληϯλʔνʔϜ͕༻ҙͨ͠αʔόͷೳྗΛҾ͖ग़͠ɺΞϓϦέʔγϣϯ ΤϯδχΞ͕࡞ͨ͠ίʔυΛ࠷ߴͷܗͰಈ͔͢ͷ͕ࣗΒ(νʔϜ)ͷׂ • αʔϏεͷՄ༻ੑϋʔυΣΞͷνʔϜͰͳ͘ɺιϑτΣΞΛѻ͏νʔ Ϝͷ
Slide 9
Slide 9 text
ΦϖϨʔγϣϯΤϯδχΞʁ • 2010ग़൛ʮWeb Operationsʯ • ܧଓతσϓϩΠɺDevOpsɺࣗಈԽɺࢹͳͲΦϖ Ϩʔγϣϯʹؔ͢ΔΤοηΠ • ΦϖϨʔγϣϯ(ӡ༻)ΛϧʔνϯϫʔΫͱଊ͑Δਓ ଟ͍
Slide 10
Slide 10 text
SREͱͷग़ձ͍ • 2012/7 ༑ਓͱͷIRCͰͷձ͔Βڭ͑ͯΒ͏ • GoogleͷڊେͳΠϯϑϥͱαʔϏεͷՔಇɺ҆ఆੑΛ୲͢ΔνʔϜ͕SRE • https://research.googleblog.com/2012/07/site-reliability-engineers-solving-most.html ʮSite Reliability Engineers: “solving the most interesting problems”ʯ͜ͷهࣄ͕ެ։͞Εͨࠒ • twitter ͷbioൃදεϥΠυʹʮSite ReliabilityʯΛՃͯ͠ҙࣝ • https://www.slideshare.net/kazeburo/yapc2102mysql/2 (2012/9) • 2015/11 ϝϧΧϦʹͯνʔϜ໊ͱͯ͠ఏҊ
Slide 11
Slide 11 text
ϝϧΧϦʹ͍ͭͯ
Slide 12
Slide 12 text
ϝϧΧϦ • ࠃ࠷େڃͷϑϦϚΞϓϦ • 3Ͱ؆୯ʹग़ 1) ࣸਅΛࡱΔ 2) ใΛهೖ 3) ग़ϘλϯΛԡ͢ • ҆৺҆શͳܾࡁɾऔҾ • ΤεΫϩʔ(͓ۚͷΓͱΓ͕ࣾؒʹհࡏ) • ಗ໊ૹ
Slide 13
Slide 13 text
ถࠃ/ӳࠃ ͷల։ JP UK US
Slide 14
Slide 14 text
KPI μϯϩʔυ GMV(૯औҾֹ) 1ԯDLҎ্(શੈք) ݄ؒ100ԯԁҎ্ ग़ 1100ສҎ্
Slide 15
Slide 15 text
ϝϧΧϦγεςϜ֓ཁ ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers ग़! DB Search 5-දࣔ ݕࡧө ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific େྔͷϦΫΤετ ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corp data c Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ϦΫΤετԠ DB Search ߪೖ! ඵʙ30ඵ ඵʙ ը૾ ܾࡁ AI ߴʹฒߦͯ͠େྔͷτϥϯβΫγϣϯΛѻ͏
Slide 16
Slide 16 text
ΠϯϑϥετϥΫνϟ • ϚϧνΫϥυߏ • JP͘͞ΒΠϯλʔωοτɺUSAWSɺUKGCPΛத৺ͱͨ͠ߏ • ͞Βʹ JPɺUSͰGCPΛΈ߹ΘͤʮϚΠΫϩαʔϏεʯͷج൫Λߏங
Slide 17
Slide 17 text
ΠϯϑϥετϥΫνϟ DNS: Amazon Route53 CDN: Akamai, Fastly, ImageFlux Storage: Amazon S3 Analysis: Google BigQuery / Monitoring: Mackerel, DataDog JP UK US + +
Slide 18
Slide 18 text
ϚΠΫϩαʔϏεج൫ API Gateway ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. Users Client Multimedia Corporate data center Traditional server Mobile Client Management onsole IAM Add-on Example: IAM Add-on man Intelligence Tasks (HIT) Assignment/ Task Requester Workers search backend service offer JP US • طଘAPI(ϞϊϦεAPI)ΛWrap͢Δ API Gateway Λ։ൃ͠ɺGCP(GKE)Ͱߏங • ϞϊϦεAPI֎Ͱͷ৽ػೳ։ൃ • αʔϏεΛஈ֊తʹϚΠΫϩαʔϏεͱ͠ ͯղ • ϞϊϦεAPIɾϚΠΫϩαʔϏε͔Βݺͼ ग़͞ΕΔBackendαʔϏεGKE্Ͱಈ࡞ ϞϊϦεAPI
Slide 19
Slide 19 text
SREͱ վΊͯ
Slide 20
Slide 20 text
SREͱ • γεςϜཧͱαʔϏεӡ༻ͷํ๏ͱͯ͠Googleͷӡ༻νʔϜΛ͍ͯ ͍ͨBen Treynor͕ఏএ • USΛத৺ʹେنͳITΠϯϑϥΛӡ༻͢Δ֤ࣾʹ·Δ • ໌֬ͳఆٛͳ͍͕ʮιϑτΣΞΤϯδχΞϦϯάʹΑͬͯɺΠϯϑϥετ ϥΫνϟɾαʔϏεશମͷՄ༻ੑɺੑೳɺηΩϡϦςΟΛվળ͢ΔʯΤϯδχ Ξ/νʔϜ͓Αͼ৫ͷ͋Γํ
Slide 21
Slide 21 text
Google SRE • GoogleͷSREʹιϑτΣΞΤϯδχΞϦϯάʹՃ͑ɺγεςϜɾӡ༻ ͷೳྗ͕ٻΊΒΕΔ • ιϑτΣΞΤϯδχΞϦϯάʮࣗಈԽʯʹಛʹྗ • SREͷਓαʔϏεͷنʹൺྫͤ͞ͳ͍(Googleʹ͓͍ͯݱ࣮తʹͰ͖ͳ͍) • ʮτΠϧ(ख࡞ۀͰߦΘΕɺࣗಈԽՄೳͰ܁Γฦ͢͜ͱʹՁΛ࣋ͨͳ͍)ʯͷ໓
Slide 22
Slide 22 text
Google SRE • ۀ࣌ؒͷ50%ιϑτΣΞΤϯδχΞϦϯάΛߦ͏ • ࣗಈԽ(ࣗԽ)ɺ৴པੑ্ʹ͋ͯΔ • 50%Λ͑Δ͜ͱ͕͋Εۀͷݟ͠ΛഭΒΕΔ • SLAɺΤϥʔόδΣοτ(༧ࢉ)ʹΑΔ։ൃऀͷརௐ • ։ൃऀνʔϜͱՄ༻ੑͷඪ(SLA)ΛαʔϏε͝ͱʹઃఆɻߴ͗͢Δઃఆ͠ͳ͍ • ΤϥʔόδΣοτʹ͋Δͱ͖։ൃऀੵۃతͳϦϦʔεΛߦ͍ɺ༧ࢉΛ͑Δ ߹৴པੑճ෮ͷͨΊͷ։ൃʹઐ೦͢Δ͜ͱ͕ٻΊΒΕΔ
Slide 23
Slide 23 text
ຊࠃͰͷSRE • 201511݄ ϝϧΧϦٕज़blogͰSREΛհ • RettyɺαΠϘζɺCookPadɺMixiɺͯͳͳͲWebܥاۀΛத৺ʹSRE ͷ࠾༻͕ਐΉ • SRE Tech Talk։࠵ • ୈҰճ: 20166݄ɻୈೋճ: 20171݄ • 100໊Ҏ্ͷࢀՃऀΛूΊΔ
Slide 24
Slide 24 text
ຊࠃͰͷSRE • ॻ੶/ࡶࢽ • ΦϥΠϦʔʮSRE αΠτϦϥΠΞϏϦςΟ ΤϯδχΞϦϯάʯ • ܦBPʮܦSYSTEM 2017/7ʯ
Slide 25
Slide 25 text
ϝϧΧϦSRE
Slide 26
Slide 26 text
ͳͥϝϧΧϦͰSREͳͷ͔ • 2015/11 ΠϯϑϥνʔϜ͔ΒSREʹվশ • ϝϧΧϦΛ͓٬͞·ʹͬͯ͘Β͏ʹʮ͍ͭͰշదʹ҆શʹ͑Δʯ ৴པੑ͕ॏཁɻSRE͜ͷ৴པੑΛؚΉ • ιϑτΣΞΤϯδχΞϦϯάʹΑͬͯαʔϏεͷύϑΥʔϚϯεͱՄ༻ੑͷ ্ɺσϓϩΠͳͲͷࣗಈԽ͕ۀͷத৺ • ઌਐతͳऔΓΈͱͯ͠ͷૂ͍
Slide 27
Slide 27 text
ϝϧΧϦSRE • 2018/2 ࣌Ͱϝϯόʔ10໊ • ϚΠΫϩαʔϏεج൫ߏஙɺSys-ML(MLOps)ʹܞΘΔΤϯδχΞ • େنͳWebαʔϏεͷӡ༻ܦݧ͕͋Δத్͕ଟ͍͕ɺ৽ଔϝϯόʔࡏ੶ • ݸʑͷϝϯόʔ͕ೳಈతʹΛൃݟ͠ɺղܾ͍ͯ͘͠ • SlackɾGitHubͰͷٞɺJiraͰͷνέοτཧΛ௨ͯ͠ใڞ༗
Slide 28
Slide 28 text
SlackͰticket࡞ • Jiraͷticket࡞ΛࣗಈԽɾ؆ૉԽ • ࢥ͍͍ͭͨ࣌ʹ࡞ɾ՝ͷڞ༗ • SlackΛΈͨνʔϜϝϯόʔ͕͙͢ʹ ղܾ͢Δ͜ͱ
Slide 29
Slide 29 text
ϝϧΧϦ SRE ͷۀൣғ Operations Software Eng. ج൫ߏங OnCall (োରԠ) ґཔରԠ εέʔϥϏϦςΟɾՄ༻ੑվળ ࣗಈԽɺDBAɺϛυϧΣΞߏங ΞϓϦέʔγϣϯͷઃܭϨϏϡʔ ϩάऩूɾੳج൫ͷߏஙɺӡ༻ αʔόϓϩϏδϣχϯάɾσϓϩΠɺϚΠΫϩαʔϏεɾ.-ج൫ͷඋ ηΩϡϦςΟʗෆਖ਼ར༻ݕग़ γεςϜӡ༻ΛʮΈʯͱͯ͠ ࡞Γ্͛Δ͜ͱ͕ٻΊΒΕ͍ͯΔ
Slide 30
Slide 30 text
࠷ۙͷࣄྫ
Slide 31
Slide 31 text
CruiseControl 8PSLFS2VFVFγεςϜͷෛՙͷ੍ޚ
Slide 32
Slide 32 text
Worker/QueueγεςϜͷ • Jobͷॲཧ༷ʑͳཁҼͰมԽ͢Δ • Batch͔Βͷenqueue • Workerͷϓϩηε • ॲཧ༰ • ॲཧ͕͗͢Δ͜ͱͰγεςϜʹෛՙ App App Queue RDBMS Worker Worker Worker Worker
Slide 33
Slide 33 text
CRMπʔϧͷࣄྫ • ৴ͷ৴ϝσΟΞͷબʹΑܾͬͯ·Δ •RDBMSͷॻ͖ࠐΈ(ΞϓϦPM) •Mail •RDBMSͷॻ͖ࠐΈ(ΞϓϦ௨) •Push৴ • ॲཧ͕ҰఆͰͳ͍ɾ৴ຖʹมԽ • ৴ʹ͔͔Δ࣌ؒΛ͘͢ΔͨΊWorkerͷΛखಈͰௐ • ௐ࿙ΕʹΑͬͯఆ֎ͷෛՙɾো CRM Queue RDBMS Worker Worker Worker Worker Mail Push ߴ
Slide 34
Slide 34 text
CruiseControl • ʮΛ੍ޚ͢ΔαʔϏεʯΛߏங • Worker͕ॲཧΛ։࢝͢ΔલʹCruiseControlʹ ͍߹ΘͤΔ • ॲཧ͕͍߹wait͕ೖΔ • Worker͕ेʹ͋Εॲཧ͕Ұఆʹ • CRMπʔϧͷߴͳ৴ͱαʔϏεͷ҆ఆՔಇΛ࣮ݱ CRM App Queue Worker Worker Worker Cruise Control RDBMS Mail Push
Slide 35
Slide 35 text
CruiseControl = NGINX • ngx_http_limit_req_module Λར༻ • pathͱheaderʹΑͬͯΛ੍ޚ limit_req_zone $http_x_limit_req zone=r10:50m rate=10r/s; limit_req_zone $http_x_limit_req zone=r50:50m rate=50r/s; limit_req_zone $http_x_limit_req zone=r100:50m rate=100r/s; server { listen 8080; root /path/to/root; location /r10 { limit_req zone=r10 burst=4294967296; } location /r50 { limit_req zone=r50 burst=4294967296; } location /r100 { limit_req zone=r100 burst=4294967296; } } % curl -H 'X-Limit-Req: push-msg' cruisecontrol:8080/r100
Slide 36
Slide 36 text
CruiseControl ʹΑ࣮ͬͯݱ • γεςϜͷෛՙͱqueueͷΛݟͳ͕ΒWorkerΛมߋ͢Δ ৬ਓٕతରԠͷγεςϜԽ • queueͷॲཧͷSLAͱͯ͠ػೳ
Slide 37
Slide 37 text
·ͱΊ
Slide 38
Slide 38 text
·ͱΊ • ΠϯϑϥνʔϜ͔ΒSRE • SREγεςϜʹؔ͢ΔࣝͱιϑτΣΞΤϯδχΞϦϯάʹΑͬͯαʔϏ εͷύϑΥʔϚϯεͱՄ༻ੑͷ্Λ࣮ݱ͢Δ • γεςϜӡ༻ΛΈԽ͢Δ • ϚΠΫϩαʔϏεɾMLج൫ͳͲࣄۀΛ֦େΛࢧ͑ΔͷιϑτΣΞ • ͓٬༷ʹʮ͍ͭͰշదʹ҆શʹ͑Δʯ৴པੑΛఏڙ͢Δ
Slide 39
Slide 39 text
SRE More!!! https://twitter.com/kazeburo/status/890131903529054210
Slide 40
Slide 40 text
Ҏ্ => www.mercari.com/jp/jobs/ TQFBLFSEFDLDPNLB[FCVSP
Slide 41
Slide 41 text
No content