Slide 1

Slide 1 text

ΠϯϑϥνʔϜ͔ΒSRE΁
 ϝϧΧϦΛࢧ͑Δ৽͍͠Πϯϑϥͷ͋Γํ Masahiro Nagano @kazeburo
 Developers Summit 2018/2/16

Slide 2

Slide 2 text

Me • Masahiro Nagano / ௕໺խ޿ • @kazeburo (twitter/github) • גࣜձࣾϝϧΧϦ
 ϓϦϯγύϧΤϯδχΞ
 Site Reliability Engineering (SRE) νʔϜ • BASE, Inc ٕज़ΞυόΠβʔ

Slide 3

Slide 3 text

Me • ~ 2006: ژ౎ͰελʔτΞοϓࢀՃ • ΤϯδχΞ਺໊ • ։ൃΛ͠ͳ͕ΒΠϯϑϥͷ໘౗ΛݟΔɻDC࡞ۀ΋΍ͬͨ • ΞϓϦέʔγϣϯͷνϡʔχϯάɺۭ͍ͨϦιʔεͰ৽ػೳͷ௥Ճͱ͍͏αΠΫϧ • 2006 ~: mixi • ʮΞϓϦέʔγϣϯӡ༻νʔϜʯDCʹߦ͔ͳ͍ΦϖϨʔγϣϯ • େن໛ը૾഑৴΍ΞϓϦέʔγϣϯͷνϡʔχϯά

Slide 4

Slide 4 text

Me • 2010 ~: livedoor (NHN Japan => LINE) • livedoor΍LINEϑΝϛϦʔͷαʔϏεΛԣஅͯ͠ΠϯϑϥετϥΫνϟ΍
 ύϑΥʔϚϯεͷվળ • livedoor Blog ͷMySQLνϡʔχϯά / Plack࠷దԽ • 2015/02 ~ : mercari

Slide 5

Slide 5 text

࠷ۙͷ׆ಈ • ొஃ • AWS Dev Day Tokyo 2017 • YAPC::Fukuoka 2017, YAPC::Hokkaido 2016 • YAPC::Okinawa 2018, Manabiya Teratail Developer Days ొஃ༧ఆ • هࣄ • WEB+DB PRESS Vol.88, Vol.92-97 ࿈ࡌ, Vol.100 • ೔ܦSYSTEMS 2017 7݄߸, ITPro

Slide 6

Slide 6 text

AGENDA • ࣗݾ঺հ • SREͱͷग़ձ͍ • ϝϧΧϦʹ͍ͭͯ • SREͱ͸ • ϝϧΧϦSREͷࣄྫͱ͜Ε͔Β

Slide 7

Slide 7 text

SRE ͱͷग़ձ͍ ͳͥSREͳͷ͔

Slide 8

Slide 8 text

ΠϯϑϥΤϯδχΞʁ • mixi࣌୅͸ʮΞϓϦӡ༻νʔϜʯ • Πϯϑϥ(σʔληϯλʔ)νʔϜ͸ଞʹ͍Δ • σʔληϯλʔνʔϜ͕༻ҙͨ͠αʔόͷೳྗΛҾ͖ग़͠ɺΞϓϦέʔγϣϯ ΤϯδχΞ͕࡞੒ͨ͠ίʔυΛ࠷ߴͷܗͰಈ͔͢ͷ͕ࣗΒ(νʔϜ)ͷ໾ׂ • αʔϏεͷՄ༻ੑ͸ϋʔυ΢ΣΞͷνʔϜͰ͸ͳ͘ɺιϑτ΢ΣΞΛѻ͏νʔ Ϝͷ੹೚

Slide 9

Slide 9 text

ΦϖϨʔγϣϯΤϯδχΞʁ • 2010೥ग़൛ʮWeb Operationsʯ • ܧଓతσϓϩΠɺDevOpsɺࣗಈԽɺ؂ࢹͳͲΦϖ Ϩʔγϣϯʹؔ͢ΔΤοηΠ • ΦϖϨʔγϣϯ(ӡ༻)ΛϧʔνϯϫʔΫͱଊ͑Δਓ΋ ଟ͍

Slide 10

Slide 10 text

SREͱͷग़ձ͍ • 2012/7 ༑ਓͱͷIRCͰͷձ࿩͔Βڭ͑ͯ΋Β͏ • GoogleͷڊେͳΠϯϑϥͱαʔϏεͷՔಇɺ҆ఆੑΛ୲౰͢ΔνʔϜ͕SRE • https://research.googleblog.com/2012/07/site-reliability-engineers-solving-most.html
 ʮSite Reliability Engineers: “solving the most interesting problems”ʯ͜ͷهࣄ͕ެ։͞Εͨࠒ • twitter ͷbio΍ൃදεϥΠυʹʮSite ReliabilityʯΛ௥Ճͯ͠ҙࣝ • https://www.slideshare.net/kazeburo/yapc2102mysql/2 (2012/9) • 2015/11 ϝϧΧϦʹͯνʔϜ໊ͱͯ͠ఏҊ

Slide 11

Slide 11 text

ϝϧΧϦʹ͍ͭͯ

Slide 12

Slide 12 text

ϝϧΧϦ • ࠃ಺࠷େڃͷϑϦϚΞϓϦ • 3෼Ͱ؆୯ʹग़඼ 1) ࣸਅΛࡱΔ 2) ঎඼৘ใΛهೖ 3) ग़඼ϘλϯΛԡ͢ • ҆৺҆શͳܾࡁɾऔҾ • ΤεΫϩʔ(͓ۚͷ΍ΓͱΓ͸౰͕ࣾؒʹհࡏ) • ಗ໊഑ૹ

Slide 13

Slide 13 text

ถࠃ/ӳࠃ ΁ͷల։ JP UK US

Slide 14

Slide 14 text

KPI μ΢ϯϩʔυ਺ GMV(૯औҾֹ) 1ԯDLҎ্(શੈք) ݄ؒ100ԯԁҎ্ ग़඼਺ 1೔100ສ඼Ҏ্

Slide 15

Slide 15 text

ϝϧΧϦγεςϜ֓ཁ ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers ग़඼! DB Search 5-දࣔ ݕࡧ൓ө ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific େྔͷϦΫΤετ ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corp data c Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ϦΫΤετԠ౴ DB Search ߪೖ! ਺ඵʙ30ඵ ਺ඵʙ ը૾ ܾࡁ AI ߴ଎ʹฒߦͯ͠େྔͷτϥϯβΫγϣϯΛѻ͏

Slide 16

Slide 16 text

ΠϯϑϥετϥΫνϟ • ϚϧνΫϥ΢υߏ੒ • JP͸͘͞ΒΠϯλʔωοτɺUS͸AWSɺUK͸GCPΛத৺ͱͨ͠ߏ੒ • ͞Βʹ JPɺUSͰ͸GCPΛ૊Έ߹ΘͤʮϚΠΫϩαʔϏεʯͷج൫Λߏங

Slide 17

Slide 17 text

ΠϯϑϥετϥΫνϟ DNS: Amazon Route53 CDN: Akamai, Fastly, ImageFlux Storage: Amazon S3 Analysis: Google BigQuery / Monitoring: Mackerel, DataDog JP UK US + +

Slide 18

Slide 18 text

ϚΠΫϩαʔϏεج൫ API Gateway ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. Users Client Multimedia Corporate data center Traditional server Mobile Client Management onsole IAM Add-on Example: IAM Add-on man Intelligence Tasks (HIT) Assignment/ Task Requester Workers search backend service offer JP US • طଘAPI(ϞϊϦεAPI)ΛWrap͢Δ API Gateway Λ։ൃ͠ɺGCP(GKE)Ͱߏங • ϞϊϦεAPI֎Ͱͷ৽ػೳ։ൃ • αʔϏεΛஈ֊తʹϚΠΫϩαʔϏεͱ͠ ͯ෼ղ • ϞϊϦεAPIɾϚΠΫϩαʔϏε͔Βݺͼ ग़͞ΕΔBackendαʔϏε΋GKE্Ͱಈ࡞ ϞϊϦεAPI

Slide 19

Slide 19 text

SREͱ͸ վΊͯ

Slide 20

Slide 20 text

SREͱ͸ • γεςϜ؅ཧͱαʔϏεӡ༻ͷํ๏࿦ͱͯ͠Googleͷӡ༻νʔϜΛ཰͍ͯ ͍ͨBen Treynor͕ఏএ • USΛத৺ʹେن໛ͳITΠϯϑϥΛӡ༻͢Δ֤ࣾʹ޿·Δ • ໌֬ͳఆٛ͸ͳ͍͕ʮιϑτ΢ΣΞΤϯδχΞϦϯάʹΑͬͯɺΠϯϑϥετ ϥΫνϟɾαʔϏεશମͷՄ༻ੑɺੑೳɺηΩϡϦςΟΛվળ͢ΔʯΤϯδχ Ξ/νʔϜ͓Αͼ૊৫ͷ͋Γํ

Slide 21

Slide 21 text

Google SRE • GoogleͷSREʹ͸ιϑτ΢ΣΞΤϯδχΞϦϯάʹՃ͑ɺγεςϜɾӡ༻ ͷೳྗ͕ٻΊΒΕΔ • ιϑτ΢ΣΞΤϯδχΞϦϯά͸ʮࣗಈԽʯʹಛʹ஫ྗ • SREͷਓ਺͸αʔϏεͷن໛ʹൺྫͤ͞ͳ͍(Googleʹ͓͍ͯ΋ݱ࣮తʹͰ͖ͳ͍) • ʮτΠϧ(ख࡞ۀͰߦΘΕɺࣗಈԽՄೳͰ܁Γฦ͢͜ͱʹՁ஋Λ࣋ͨͳ͍)ʯͷ๾໓

Slide 22

Slide 22 text

Google SRE • ۀ຿࣌ؒͷ50%͸ιϑτ΢ΣΞΤϯδχΞϦϯάΛߦ͏ • ࣗಈԽ(ࣗ཯Խ)ɺ৴པੑ޲্ʹ͋ͯΔ • 50%Λ௒͑Δ͜ͱ͕͋Ε͹ۀ຿ͷݟ௚͠ΛഭΒΕΔ • SLAɺΤϥʔόδΣοτ(༧ࢉ)ʹΑΔ։ൃऀͷར֐ௐ੔ • ։ൃऀνʔϜͱՄ༻ੑͷ໨ඪ(SLA)ΛαʔϏε͝ͱʹઃఆɻߴ͗͢Δઃఆ͸͠ͳ͍ • ΤϥʔόδΣοτ಺ʹ͋Δͱ͖͸։ൃऀ͸ੵۃతͳϦϦʔεΛߦ͍ɺ༧ࢉΛ௒͑Δ৔ ߹͸৴པੑճ෮ͷͨΊͷ։ൃʹઐ೦͢Δ͜ͱ͕ٻΊΒΕΔ

Slide 23

Slide 23 text

೔ຊࠃ಺ͰͷSRE • 2015೥11݄ ϝϧΧϦٕज़blogͰSREΛ঺հ • RettyɺαΠϘ΢ζɺCookPadɺMixiɺ͸ͯͳͳͲWebܥاۀΛத৺ʹSRE ͷ࠾༻͕ਐΉ • SRE Tech Talk։࠵ • ୈҰճ: 2016೥6݄ɻୈೋճ: 2017೥1݄ • 100໊Ҏ্ͷࢀՃऀΛूΊΔ

Slide 24

Slide 24 text

೔ຊࠃ಺ͰͷSRE • ॻ੶/ࡶࢽ • ΦϥΠϦʔʮSRE αΠτϦϥΠΞϏϦςΟ
 ΤϯδχΞϦϯάʯ • ೔ܦBPʮ೔ܦSYSTEM 2017/7ʯ

Slide 25

Slide 25 text

ϝϧΧϦSRE

Slide 26

Slide 26 text

ͳͥϝϧΧϦͰSREͳͷ͔ • 2015/11 ΠϯϑϥνʔϜ͔ΒSREʹվশ • ϝϧΧϦΛ͓٬͞·ʹ௕͘࢖ͬͯ΋Β͏ʹ͸ʮ͍ͭͰ΋շదʹ҆શʹ࢖͑Δʯ ৴པੑ͕ॏཁɻSRE͸͜ͷ৴པੑΛؚΉ • ιϑτ΢ΣΞΤϯδχΞϦϯάʹΑͬͯαʔϏεͷύϑΥʔϚϯεͱՄ༻ੑͷ ޲্ɺσϓϩΠͳͲͷࣗಈԽ͕ۀ຿ͷத৺ • ઌਐతͳऔΓ૊Έͱͯ͠ͷૂ͍

Slide 27

Slide 27 text

ϝϧΧϦSRE • 2018/2 ࣌఺Ͱϝϯόʔ͸10໊ • ϚΠΫϩαʔϏεج൫ߏஙɺSys-ML(MLOps)ʹܞΘΔΤϯδχΞ΋ • େن໛ͳWebαʔϏεͷӡ༻ܦݧ͕͋Δத్͕ଟ͍͕ɺ৽ଔϝϯόʔ΋ࡏ੶ • ݸʑͷϝϯόʔ͕ೳಈతʹ໰୊Λൃݟ͠ɺղܾ͍ͯ͘͠ • SlackɾGitHubͰͷٞ࿦ɺJiraͰͷνέοτ؅ཧΛ௨ͯ͠৘ใڞ༗

Slide 28

Slide 28 text

SlackͰticket࡞੒ • Jiraͷticket࡞੒ΛࣗಈԽɾ؆ૉԽ • ࢥ͍͍ͭͨ࣌ʹ࡞੒ɾ՝୊ͷڞ༗ • SlackΛΈͨνʔϜϝϯόʔ͕͙͢ʹ
 ղܾ͢Δ͜ͱ΋

Slide 29

Slide 29 text

ϝϧΧϦ SRE ͷۀ຿ൣғ Operations Software Eng. ج൫ߏங OnCall (ো֐ରԠ) ґཔରԠ εέʔϥϏϦςΟɾՄ༻ੑվળ
 ࣗಈԽɺDBAɺϛυϧ΢ΣΞߏங ΞϓϦέʔγϣϯͷઃܭϨϏϡʔ ϩάऩूɾ෼ੳج൫ͷߏஙɺӡ༻ αʔόϓϩϏδϣχϯάɾσϓϩΠɺϚΠΫϩαʔϏεɾ.-ج൫ͷ੔උ ηΩϡϦςΟʗෆਖ਼ར༻ݕग़ γεςϜӡ༻Λʮ࢓૊Έʯͱͯ͠
 ࡞Γ্͛Δ͜ͱ͕ٻΊΒΕ͍ͯΔ

Slide 30

Slide 30 text

࠷ۙͷࣄྫ

Slide 31

Slide 31 text

CruiseControl 8PSLFS2VFVFγεςϜͷෛՙͷ੍ޚ

Slide 32

Slide 32 text

Worker/QueueγεςϜͷ໰୊఺ • Jobͷॲཧ଎౓͸༷ʑͳཁҼͰมԽ͢Δ • Batch͔Βͷenqueue଎౓ • Workerͷϓϩηε਺ • ॲཧ಺༰ • ॲཧ͕଎͗͢Δ͜ͱͰγεςϜʹෛՙ App App Queue RDBMS Worker Worker Worker Worker

Slide 33

Slide 33 text

಺੡CRMπʔϧͷࣄྫ • ഑৴ͷ଎౓͸഑৴ϝσΟΞͷબ୒ʹΑܾͬͯ·Δ •RDBMS΁ͷॻ͖ࠐΈ(ΞϓϦ಺PM) •Mail •RDBMS΁ͷॻ͖ࠐΈ(ΞϓϦ಺௨஌) •Push഑৴ • ॲཧ଎౓͕ҰఆͰ͸ͳ͍ɾ഑৴ຖʹมԽ • ഑৴ʹ͔͔Δ࣌ؒΛ୹͘͢ΔͨΊWorkerͷ਺ΛखಈͰௐ੔ • ௐ੔࿙ΕʹΑͬͯ૝ఆ֎ͷෛՙɾো֐ CRM Queue RDBMS Worker Worker Worker Worker Mail Push ௿଎ ߴ଎

Slide 34

Slide 34 text

CruiseControl • ʮ଎౓Λ੍ޚ͢ΔαʔϏεʯΛߏங • Worker͕ॲཧΛ։࢝͢ΔલʹCruiseControlʹ
 ໰͍߹ΘͤΔ • ॲཧ଎౓͕଎͍৔߹͸wait͕ೖΔ • Worker਺͕े෼ʹ͋Ε͹ॲཧ଎౓͕Ұఆʹ • CRMπʔϧͷߴ଎ͳ഑৴ͱαʔϏεͷ҆ఆՔಇΛ࣮ݱ CRM App Queue Worker Worker Worker Cruise
 Control RDBMS Mail Push

Slide 35

Slide 35 text

CruiseControl = NGINX • ngx_http_limit_req_module Λར༻ • pathͱheaderʹΑͬͯ଎౓Λ੍ޚ limit_req_zone $http_x_limit_req zone=r10:50m rate=10r/s; limit_req_zone $http_x_limit_req zone=r50:50m rate=50r/s; limit_req_zone $http_x_limit_req zone=r100:50m rate=100r/s; server { listen 8080; root /path/to/root; location /r10 { limit_req zone=r10 burst=4294967296; } location /r50 { limit_req zone=r50 burst=4294967296; } location /r100 { limit_req zone=r100 burst=4294967296; } } % curl -H 'X-Limit-Req: push-msg' cruisecontrol:8080/r100

Slide 36

Slide 36 text

CruiseControl ʹΑ࣮ͬͯݱ • γεςϜͷෛՙͱqueueͷ਺Λݟͳ͕ΒWorker਺Λมߋ͢Δ
 ৬ਓٕతରԠͷγεςϜԽ • queueͷॲཧ଎౓ͷSLAͱͯ͠΋ػೳ

Slide 37

Slide 37 text

·ͱΊ

Slide 38

Slide 38 text

·ͱΊ • ΠϯϑϥνʔϜ͔ΒSRE΁ • SRE͸γεςϜʹؔ͢Δ஌ࣝͱιϑτ΢ΣΞΤϯδχΞϦϯάʹΑͬͯαʔϏ εͷύϑΥʔϚϯεͱՄ༻ੑͷ޲্Λ࣮ݱ͢Δ • γεςϜӡ༻Λ࢓૊ΈԽ͢Δ • ϚΠΫϩαʔϏεɾMLج൫ͳͲࣄۀΛ֦େΛࢧ͑Δͷ΋ιϑτ΢ΣΞ • ͓٬༷ʹʮ͍ͭͰ΋շదʹ҆શʹ࢖͑Δʯ৴པੑΛఏڙ͢Δ

Slide 39

Slide 39 text

SRE More!!! https://twitter.com/kazeburo/status/890131903529054210

Slide 40

Slide 40 text

Ҏ্ => www.mercari.com/jp/jobs/ TQFBLFSEFDLDPNLB[FCVSP

Slide 41

Slide 41 text

No content