Upgrade to Pro — share decks privately, control downloads, hide ads and more …

開発と運用でサービスの信頼性を高める
「SRE」の実践/Mercari SRE in prac...

kazeburo
September 01, 2017

開発と運用でサービスの信頼性を高める
「SRE」の実践/Mercari SRE in practice Enterprise Development Conference

開発と運用でサービスの信頼性を高める
「SRE」の実践
Enterprise Development Conference

kazeburo

September 01, 2017
Tweet

More Decks by kazeburo

Other Decks in Technology

Transcript

  1. ࣗݾ঺հ • Masahiro Nagano / ௕໺խ޿ • @kazeburo (twitter/github) •

    גࣜձࣾϝϧΧϦ
 ϓϦϯγύϧΤϯδχΞ
 Site Reliability Engineering (SRE) νʔϜ • BASE, Inc ٕज़ΞυόΠβʔ
  2. ࣗݾ঺հ(ܦྺɾ׆ಈ) • ܦྺ • 2006೥ mixi - ΞϓϦӡ༻νʔϜ • 2010೥

    livedoor (LINE) - ։ൃࢧԉνʔϜ • 2015೥ ݱ৬ - SRE • 15೥Ҏ্ WebαʔϏεΛΠϯϑϥ͔Βࢧ͑Δۀ຿ • ొஃʗࣥච • AWS Dev Day Tokyo 2017 ొஃ • WEB+DB PRESS Vol. 100 هࣄࣥච
  3. ϝϧΧϦ • ࠃ಺࠷େڃͷϑϦϚΞϓϦ • 3෼Ͱ؆୯ʹग़඼ 1) ࣸਅΛࡱΔ 2) ঎඼৘ใΛهೖ 3)

    ग़඼ϘλϯΛԡ͢ • ҆৺҆શͳܾࡁɾऔҾ • ΤεΫϩʔ • ͓ۚͷ΍ΓͱΓ͸౰͕ࣾؒʹհࡏ • ಗ໊഑ૹ
  4. ϝϧΧϦγεςϜ֓ཁ ©2011 Amazon Web Services LLC or its affiliates. All

    rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers ग़඼! DB Search 5-දࣔ ݕࡧ൓ө ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific େྔͷϦΫΤετ ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corp data c Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ϦΫΤετԠ౴ DB Search ߪೖ! ਺ඵʙ30ඵ ਺ඵʙ ߴ଎ʹฒߦͯ͠େྔͷτϥϯβΫγϣϯΛѻ͏ ը૾ ܾࡁ AI ϑΟʔυόοΫ
  5. ΠϯϑϥετϥΫνϟ JP US UK DNS: Amazon Route53 CDN: Akamai, Fastly,

    ImageFlux Storage: Amazon S3 Analysis: Google BigQuery ܾࡁ/෺ྲྀαʔϏε ܾࡁ/෺ྲྀαʔϏε ܾࡁ/෺ྲྀαʔϏε
  6. SREͱ͸ • Site Reliability Engineering/Engineer ͷུ • Reliability = ৴པੑ

    • γεςϜ؅ཧͱαʔϏεӡ༻ͷํ๏࿦ͱͯ͠Googleͷӡ༻νʔϜΛ཰͍͍ͯͨ Ben Treynor͕ఏএ • USΛத৺ʹେن໛ͳITΠϯϑϥΛӡ༻͢Δ֤ࣾʹ޿·Δ • ໌֬ͳఆٛ͸ͳ͍͕ʮιϑτ΢ΣΞΤϯδχΞϦϯάʹΑͬͯɺΠϯϑϥετϥΫ νϟɾαʔϏεશମͷՄ༻ੑɺੑೳɺηΩϡϦςΟΛվળ͢ΔʯΤϯδχΞ/νʔϜ
  7. Google SRE • ۀ຿࣌ؒͷ50%͸ιϑτ΢ΣΞΤϯδχΞϦϯάΛߦ͏ • ࣗಈԽ(ࣗ཯Խ)ɺ৴པੑ޲্ʹ͋ͯΔ • 50%Λ௒͑Δ͜ͱ͕͋Ε͹ۀ຿ͷେ෯ͳݟ௚͠ΛഭΒΕΔ • SLAɺΤϥʔόδΣοτ(༧ࢉ)ʹΑΔ։ൃऀͷར֐ௐ੔

    • ։ൃऀνʔϜͱՄ༻ੑͷ໨ඪΛαʔϏε͝ͱʹઃఆ • ΤϥʔόδΣοτ಺ʹ͋Δͱ͖͸։ൃऀ͸ੵۃతͳϦϦʔεΛߦ͍ɺ༧ࢉΛ௒ ͑Δ৔߹͸৴པੑճ෮ͷͨΊͷ։ൃʹઐ೦͢Δ͜ͱ͕ٻΊΒΕΔ
  8. SRE΁ͷظ଴ͷߴ·Γ • ॻ੶/ࡶࢽ • ΦϥΠϦʔʮSRE αΠτϦϥΠΞϏϦςΟΤϯδχΞϦϯάʯ • ೔ܦBPʮ೔ܦSYSTEM 2017/7ʯ •

    Πϯλʔωοτ্ͷಛूهࣄ • ITPro - άʔάϧൃͷ৽ख๏ʮSREʯɺ೔ຊͰ֦େ • http://itpro.nikkeibp.co.jp/atcl/column/14/346926/030600869/ • @IT - ಛूɿ৘γεʹٻΊΒΕΔʮSREʯͱ͍͏৽ͨͳ໾ׂ • http://www.atmarkit.co.jp/ait/series/4503/
  9. Mercari SRE ͷۀ຿ൣғ Operations Software Eng. ج൫ߏங OnCall (ো֐ରԠ) Automation

    εέʔϥϏϦςΟɾՄ༻ੑվળ DBAɺϛυϧ΢ΣΞߏங ΞϓϦέʔγϣϯͷઃܭϨϏϡʔ ϩάऩूɾ෼ੳج൫ͷߏஙɺӡ༻ αʔόϓϩϏδϣχϯάɾσϓϩΠɺϚΠΫϩαʔϏεج൫ͷ੔උ ηΩϡϦςΟʗෆਖ਼ར༻ݕग़ γεςϜӡ༻Λʮ࢓૊Έʯͱͯ͠
 ࡞Γ্͛Δ͜ͱ͕ٻΊΒΕ͍ͯΔ
  10. mackerel: Ϋϥ΢υܕ؂ࢹαʔϏε • גࣜձࣾ͸ͯͳ ఏڙͷ؂ࢹαʔϏε • ͸ͯͳࣾͰͷαʔόӡ༻ϊ΢ϋ΢ • ֤छAPI͕༻ҙ͞ΕDevOpsͱͷ૬ੑ΋ྑ͍ •

    PluginͰ؂ࢹ߲໨ͷ֦ு͕Մೳ • 40ݸҎ্ͷSREνʔϜ։ൃͷPluginΛར༻ • αʔόͷঢ়ଶ͚ͩͰ͸ͳ͘ɺ֎ܗ؂ࢹɺαʔϏεʹؔΘΔ਺஋ͷՄࢹԽɺΞϥʔτઃఆՄೳ • Ξϥʔτͷ௨஌͸SlackΛ࢝Ί֤छαʔϏε࿈ܞ͕༻ҙ
  11. PagerDutyʹΑΔ௨஌ • ༷ʑͳखஈͰ௨஌Λߦ͏͜ͱ͕Ͱ͖Δɻ ൓Ԡ͢Δ·Ͱଓ͘ • mail • SMS • App

    (iOS, Android) • ి࿩ • ʮ10෼Λ௒͑ͨͱ͜ΖͰҰ౓ి࿩Λೖ ΕΔʯϧʔϧͰӡ༻
  12. େن໛ύεϫʔυϦετ߈ܸࣄྫ • 2016೥ʹ࣮ࡍʹى͖ͨ߈ܸͷΞΫ ηεݩͷࠃ • ࣍ʑʹIPΛมߋ͠ɺͦΕͧΕͷIPͰ ͸਺ճ͔͠ϩάΠϯࢼߦͤͣɺࣗಈ Ͱ๷͙͜ͱ͕೉͍͠ ͦͷଞ 18%

    Armenia 2% Azerbaijan 2% Bahrain 2% Georgia 2% Japan 2% Russian 2% Indonesia 3% Nepal 3% Pakistan 5% Thailand 5% Taiwan 6% Viet Nam 6% Brazil 10% India 30%